Research Forum | Episode 4 - abstract chalkboard background with colorful network nodes and circular icons

Research Forum Brief | September 2024

Analog optical computing for sustainable AI and beyond

分享这个页面

Jiaqi Chu

“I have been working with a fantastic team to build a new kind of computer … it uses physics and physical systems to do … computation, which means it has the potential to be 100 times more efficient compared to state-of-the-art GPUs.”

Jiaqi Chu, Principal Researcher, Microsoft Research Cambridge

Transcript: Lightning Talk

Analog optical computing for sustainable AI and beyond

Francesca Parmigiani, Principal Researcher, Microsoft Research Cambridge
Jiaqi Chu, Principal Researcher, Microsoft Research Cambridge

This talk discusses a new kind of computer—an analog optical computer—that has the potential to accelerate AI inference and hard optimization workloads by 100x, leveraging hardware-software co-design to improve the efficiency and sustainability of real-world applications.

Microsoft Research Forum, September 3, 2024

JIAQI CHU: Hi, everyone. I’m Jiaqi, a researcher at Microsoft. Over the past three years, I have been working with a fantastic team to build a new kind of computer. It doesn’t use logic case; it doesn’t use bits. It uses physics and physical systems to do computation, which means it has a potential to be 100 times more efficient compared to state-of-the-art GPUs. [The] really neat thing is that we are building it using the technologies that are soon prevalent in consumer space.

There is a catch here. This is not a general-purpose computer. It is accelerating two different but very broad classes of applications: machine learning inference and hard optimization problems. For the machine learning inference part, we have been able to show the potential of accelerating diffusion models that can generate images and other content using this computer. Actually, there are emerging forms of machine learning that can really take advantage of the amazing amount of computing offered and achieve high-level properties, like better [generalization] to out-of-distribution data. Second, the same computer can solve hard or combinatorial optimization problems. We have identified real-world problems from many industry verticals, from healthcare, finance, chemical engineering, to robotics, that could be accelerated using this computer. Exactly the same computer supporting a wide range of applications.

But before we talk about these applications and the computer, I want to go after the “why” question. I’m sure all of you have had firsthand experience of the amazing capabilities of the latest machine learning models. We are just at the start of this inflection [point]. We expect that the capabilities of those models will grow tremendously, as long as we can keep pouring exponentially increasing amount of [compute]. But this is a big problem, not just because we are spending billions and billions of dollars on AI infrastructure to train and service models, there are also serious environmental concerns about the energy and other resources that are being consumed here. I genuinely believe sustainability of AI is one of the most important questions. Unfortunately, this couldn’t be happening at a worse time. Right when these computer demands are taking off, the future trends for digital computing do not look good, with Moore’s law slowing down. This is not just our observation; it is a broader industry concern.

Over the past five/six years, this has led to a fantastic amount of research and development. Many companies, many startups, have built nontraditional computers from the broad family of analog technologies. In this context, our journey started a few years ago. Last year, we had the first generation of our computer. It was built using bulky technology, but it was already solving a scaled-down version of a real-world finance problem from Barclays. We are actually outperforming the same problem being solved on a quantum computer, which gave [us] a lot of confidence. It led to our research collaboration with Barclays, a partnership with the Microsoft Health Futures team. I’m really excited to share that we have just completed the second generation of [this] computer. It is much smaller in physical size, and this is a world first in that exactly the same computer is simultaneously solving hard optimization problems and accelerating machine learning inference.

Looking ahead, we estimate that at scale, this computer can achieve around 450 tera operations per second per watt, which is a 100-times improvement as compared to state-of-the-art GPUs. Let me now move on to give you an [introduction] to how we can compute using physical technologies. Let’s start with the basic mathematical operations: multiplication and addition. If I take a light source, and if I shine it on a filter, like the one that you have in your camera, and I can’t have any shade of gray on my filter when light passes through. This is a multiplication by weight between zero and one. This is happening simultaneously for tens of thousands of light beams that are going through this filter in a completely passive power-free fashion—massively parallel multiplication using light-matter interaction.

Similarly, when I have multiple beams of light that fall on a pixel on my smartphone camera, they add up to the photons to produce current—massively parallel addition using light-matter interaction. Once I have addition and multiplication, I can implement a vector matrix multiplier. Benefit from the inherent parallelism of optics, we can implement a massively parallel systolic vector-matrix multiplier. We are building these using consumer technologies. Our input vector is an array of micro-LEDs, the next big thing in the display space. The matrix in the middle is a chip that we use in digital projectors, and I have a sample here—a standard projector with four million pixels on it. In theory, it can simultaneously do four-million multiplications when light bounces off this. Our output vector is exactly the same chip [as] in our smartphone cameras, the standard CMOS sensor—technologies with an existing manufacturing ecosystem that we can dovetail behind.

For the most interesting applications, we also need nonlinearities, normalization, 10-edge sigmoidal. We implement this using chief-scale analog electronics by CMOS chips. Our choice to combine optics and analog electronics is unique in the industry. Hence, the name of “Analog Optical Computing” or for short, AOC. These are not just cartoons in slides. We have just completed the second generation of our computer, and my colleague, Francesca, will tell you about what this computer is solving.

FRANCESCA PARMIGIANI: The AOC computer has the potential to speed up two broad classes of applications, machine learning inference and hard optimization problems. The first example we run on our computer is the MNIST classification. We’ve trained the model a priori on GPUs, and we have encoded it on our projectors. As you can see, the digits are being successfully classified by our computer live and at a very high accuracy. But what’s more important here is that the computer is exactly doing what our emulator platform, our digital twin, is predicting, which really gives us the confidence that the computer is working correctly.

Exactly the same computer can also successfully solve hard optimization problems. As a second example, we have encoded onto the very same projectors an optimization problem. A 100% rate in these graphs means that I solve the problem correctly all the time. When I put now my hand in front of one of the projectors, I block the optical path, and so the computer loses track of what the problem is trying to solve. As a result, it attempted to solve it randomly, and the success rate dropped to zero. Once I remove my hand, the computer regains its understanding of the problem that’s solving, and then the success rate returns to 100%.

Looking ahead as we build a future large-scale generation of our computer, it is really critical to co-design the computer with the application to really take advantage of what the computer is good at and really to compensate for its deficiencies. Noise, for example, has always been the main historical challenge with analog computers. Fortunately, machine learning models are relatively amenable to noisy computers. For some models, like diffusion models, noise can actually be your friend rather than your enemy. Used in Bing and Copilot, just to name a few, such diffusion models work as follows: You have your training image, and then over time, you are adding noise to them until you end up with just complete noise. At inference, you run the reverse denoising process, starting from complete noise, and then you end up generating a clean-looking image, a dog in this instance. Importantly, this reverse process is iterative in nature. It is computationally expensive, and it requires a denoise. All requirements that perfectly fit our computer. We have implemented a very small version of such a model to generate MNIST digits using our digital twin, and we aim to run it on our computer very soon. As we then increase the size of the model, we can run advanced images, such as fashion MNIST, cipher images, and many more.

Diffusion, though, is only one example of broader classes of analog-amenable machine learning models that have this feedback nature. Others include deep equilibrium model, neural ODEs, and actually, even some of the latest models like flow matching and state space model, seem to be amenable to our computer, which is really fantastic news for us.

The same notion of co-design is also key for optimization. Let me give you a real-world example from the healthcare sector. Most likely, you or your loved ones have been inside an MRI scan, not really a great place to be in. Imagine if you can reduce that amount of time from 20–40 minutes to less than five minutes. The implication [is] for the patient’s experience and the treatment’s modalities. Actually, the math here is 15 years old, something called compressed sensing. The idea is that when your patient is inside the scanner, you are under-sampling this image—or more precisely, the scan in the freer space—and then you are solving a hard optimization problem to recover ideally the image with full fidelity. Because the problem was computationally hard, it never took off, but we have been able to map this optimization problem to our formulation in our computer. You can see the corresponding results here and how we can iteratively converge to the ground-truth scan using the AOC algorithm. Based on our earlier investigation, we think we could be able to accelerate MRI scan by a factor of 4–8x while achieving reconstruction with high fidelity, potentially reducing the scanning time down to five minutes only.

Certainly, this is an extremely high-risk but high-ambitious project, and it’s super exciting that Microsoft Research supports and, in fact, encourages such work. Of course, none of this would be possible without a fantastic interdisciplinary team behind [us] to really rethink across the whole computer stack. All these people here, world leaders in their own discipline, instead of carrying out their own research in silos, they’ve chosen to work together and operate at the boundary of their disciplines, which is where I believe key breakthroughs can happen. But this is not enough. We also need to build a community. Towards this, we are calling out for people to submit and participate to our workshop (opens in new tab) at NeurIPS 2024, where we aim to bring together ML and hardware experts. We are also looking to expand our collaboration to gain more experience in solving industry-specific optimization problems.

Towards this, we have launched an online service to allow partners to map and run their problems to our computer. To wrap up, we have been building a new kind of analog optical computer, which has the potential to offer a step change in computer performance using consumer technology. The most important thing I want to leave you with is how we can co-design our application with the underlying computer as the only way for this technology to have a chance in the future of computing. Thank you for listening.