Deep physical neural networks trained with backpropagation

Deep physical neural networks trained with backpropagation(nature.com)

108 points by groar 4 years ago | 36 comments

modeless 4 years ago |

Let me see if I can describe the laser part of the paper correctly. They made a laser pulse consisting of a bunch of different frequencies mixed together. The intensity of each frequency represents a controllable parameter of the system. The pulse was sent through a crystal that performs a complex transformation that mixes all the frequencies together in a nonlinear and noisy way. Then they measure the frequency spectrum of the output. By itself, this system performs computations of a sort, but they are not useful.

To make the computations useful, first they trained a conventional digital neural network to predict the outputs given the input controllable parameters. Then they arbitrarily assigned some of the controllable parameters to be the inputs of the neural network and others were arbitrarily assigned to be the trainable weights. Then they used the crystal to run forward passes on the training data. After each forward pass, they used the trained regular neural network to do the reverse pass and estimate the gradients of the outputs with respect to the weights. With the gradients they update the weights just like a regular neural net.

Although the gradients computed by the neural nets are not a perfect match to the real gradients of the physical system (which are unknown), they don't need to be perfect. Any drift is corrected because the forward pass is always run by the real physical system, and stochastic gradient descent is naturally pretty tolerant of noise and bias.

Since they're just using neural nets to estimate the behavior of the physical system rather than modeling it with physics, they can use literally any physical system and the behavior of the system does not have to be known. The only requirement of the system is that it does a complex nonlinear transformation on a bunch of controllable parameters to produce a bunch of outputs. They also demonstrate using vibrations of a metal plate.

Seems like this method may not lead to huge training speedups since regular neural nets are still involved. But after training, the physical system is all you need to run inference, and that part can be super efficient.

posterboy 4 years ago | |

> They made a laser pulse consisting of a bunch of different frequencies mixed together

This is how ultra short pulses are made when the waves cancel out appropriately. Now I'm not sure if they are training a network to calculate the filter efficiently for even shorter pulses, or if the purpose is supposed to be an optical neural network, or why not both.

deepsun 4 years ago | |

> regular neural net

You used these words several times, and, considered title "physical neural networks", I always wondered if you mean regular like real, or like artificial. If it's artificial, I'm not sure which one of them is "regular" -- LSTM, full, transformers?

modeless 4 years ago | | |

I thought it was pretty clear in context that "regular neural net" was a short form of "conventional digital neural network" which I did spell out explicitly the first time.

Any type of artificial neural net could be used. LSTM, transformer, convolutional, fully connected, whatever you want.

version_five 4 years ago |

This uses a physical system with controllable parameters to compute a forward pass and

> using a differentiable digital model, the gradient of the loss is estimated with respect to the controllable parameters.

So e.g. they have a tunable laser that shifts the spectrum of an encoded input based on a set of parameters, and then they update the parameters based on a gradient computed from a digital simulation of the laser (physics aware model).

When I read the headline I imagined they had implemented back propagation in a physical system

dangom 4 years ago | |

Right,

> Here we introduce a hybrid in situ–in silico algorithm, called physics-aware training, that applies backpropagation to train controllable physical systems. Just as deep learning realizes computations with deep neural networks made from layers of mathematical functions, our approach allows us to train deep physical neural networks made from layers of controllable physical systems, even when the physical layers lack any mathematical isomorphism to conventional artificial neural network layers.

To my naive understanding, and please someone correct me if I'm wrong, the point is that they are not controlling the parameters that compute the NN forward pass directly (hence "no mathematical isomorphism to conventional NNs"), but "hyper-parameters" that guide the physical system to do so. For example, rotation angles of mirrors, or distance between filters, instead of intensity values of light. This leads to the non-linear transformations happening in situ, while simpler transformations in the backprop are still computed in-silico.

visarga 4 years ago | |

> When I read the headline I imagined they had implemented back propagation in a physical system

They touch on that by observing you could train a second physical neural network to compute the gradients for the first. So it could all be physical.

> Improvements to PAT could extend the utility of PNNs. For example, PAT’s backward pass could be replaced by a neural network that directly estimates parameter updates for the physical system. Implementing this ‘teacher’ neural network with a PNN would allow subsequent training to be performed without digital assistance.

So you need to use in silico training a at first, but can get rid of it in deployment.

visarga 4 years ago |

If you can train a non-linear physical system with this method, in principle, you could also train real brains. You can't update the parameters of the brain, but you can inject signal. Assuming real brains to be black box functions for which you could learn a noisy estimator of gradients, it could be used for neural implants that supplement lost brain functionality, or a Matrix-like skill loading system.

phreeza 4 years ago | |

You need a differentiable forward model of the process, which is not available for the human brain.

melissalobos 4 years ago |

> Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability.[1]

They make this claim first, and cite one source. I haven't heard of this as an issue before. Is there anywhere else I could read more on this?

[1]https://arxiv.org/abs/2104.10350

visarga 4 years ago |

If they can scale it up to GPT-3 like sizes, it would be amazing. Foundation models like GPT-3 will be the operating system of tomorrow. But now they are too expensive to run.

They can be trained once and then frozen and you can develop new skills by learning control codes (prompts), or adding a retrieval subsystem (search engine in the loop).

If you shrink this foundation model to a single chip, something small and energy efficient, then you could have all sorts of smart AI on edge devices.

phreeza 4 years ago |

Physical/analog computers always suffer from noise limiting their usefulness. So I think it would be natural to apply this to a network architecture that includes noise as an integral Part such as GANs or VAEs.

orasis 4 years ago | |

“noise” is integral to all ML systems. You can view this through many lenses, but generalization can be thought of as decoding a noisy signal.

phreeza 4 years ago | | |

This is true, though what I was getting at was methods that make use of a noise source separate from the input.

p1esk 4 years ago |

How is this different from the good old “chip in the loop” training method?

corndoge 4 years ago | |

The paper is interesting

philip142au 4 years ago |

Mystic crystals - The age of Aquarius