How to build a deep neural network with loudspeakers?

How to build a deep neural network with loudspeakers?

[Introduction]This research paper from Cornell University presents a framework for implementing deep network learning and reasoning using physical systems. In this paper, the three systems exemplified in the article are not linear time-invariant systems. In addition to the more complicated SHG system, the other two systems (transistor, loudspeaker) are so simple that it is attractive to build a system and test the corresponding performance. For students participating in the smart car competition, it may not be necessary to rely on complex high-performance single-chip microcomputers to complete neural network inference in the future, and only a few transistors can be used to complete it.

01 Physical Neural Network

See a recent article in the journal Nature Deep Physical Neural Networks Trained with Backpropogation[1] Introducing the use of multi-layer nonlinear physical systems to build deep learning networks and complete system training through reverse stochastic gradient descent is indeed surprising and disastrous.

  

Can you imagine using a few speakers or a few FETs to form a deep physical neural network (Physical Neural Networks) to complete image classification? The classification effect is not inferior to the traditional digital neural network. It can also reach more than 97% for MNIST handwritten digit recognition. (see below based on four-channel dual harmonic signal generator (SHG) scheme)

How to build a deep neural network with loudspeakers?

Figure 1 P-physical neural network based on mechanical system, Electronic circuit and optical system respectively

  

These types of neural networks, built on physical systems rather than digital processors, aim to outperform traditional digital computers in inference speed and energy efficiency, building smart sensors and efficient network inference.

  

I guess most people, like me, will have questions when they first read this article: How can such common speakers, triodes, and optical lenses be able to complete learning, training and reasoning like a deep learning network? In particular, these are some common physical systems, which do not contain any structures such as quantum computers and neural computers.

  

The article contains a lot of work (the original PDF has more than 60 pages), and I have not read it yet, but at the beginning of the article, the principle of why the physical neural network can realize the artificial neural network algorithm is still relatively clear. Traditional deep learning can decompose the cascade calculation of several network layers. The calculation of each layer includes input data (Input) and network parameters (Parameters), which are fused to form the output of the network through the nonlinear transfer function of neurons. .

How to build a deep neural network with loudspeakers?

Figure 2 The connection between artificial neural network (ANN) and physical neural network (PNN)

  

The physical neural network is also a cascade of several layers, such as several speakers, each speaker is a layer of neural network. The input signal is the input voltage of the speaker; the network parameter is a set of controllable voltage signals, such as the signal whose duration and amplitude can be changed, which are combined with the input signal (superposition, series, etc.) and then sent to the speaker. The output sound is then collected by the microphone to form the output of the network.

How to build a deep neural network with loudspeakers?

Figure 1.3 Structure diagram of a one-layer neural network composed of speakers

  

In the system composed of the amplifier circuit composed of transistors and the optical frequency multiplier (SHG), for the input signal, the network parameters and their fusion methods are different according to the characteristics of each subsystem.

  

For example, in the figure below, the network parameters are actually a DC signal with different lengths and amplitudes, which are embedded in the input signal (A), and the output (B) is formed after passing through the triode circuit. The input signal and the network parameter fusion part is expanded. Form the network output signal with normalization (C).

How to build a deep neural network with loudspeakers?

Figure 3 The series connection of the input signal network parameter signal (the amplitude can be controlled by a DC level) in the triode circuit, and the corresponding circuit output signal

  

While the details of how the network is trained and how it works are still to be understood, the nature of the deep neural network algorithm shown in the article is refreshing. The non-linearity between the input and output of the system is used to fuse the input signal with the network signal to complete the information processing. Therefore, the three systems (speaker, triode circuit, and double frequency doubling optical system) exemplified in this article should not be linear. change system.

  

Let’s put aside the physical neural network algorithm and look at the characteristics of the three systems in the paper.

02 Nonlinear systems

The principles and methods discussed in “Signal and System” and “Automatic Control Theory” studied at the undergraduate level are basically aimed at linear time-invariant systems, so judging whether a system is linear and time-invariant is the application of these What needs to be done in the first step of theory.

  

Do the three physical systems (mechanical, electronic, optical) mentioned in the previous Nature paper satisfy linear time invariance?

2.1 Triode circuit

The triode circuit in the article is the simplest, and its nonlinearity is also the most obvious.

  

The circuit consists of four components: resistors, inductors, capacitors, and field effect transistors. Among them, resistors, inductors, and capacitors are all linear components, and only the field effect transistor is a nonlinear device. Its drain saturation current exhibits a squared relationship with gate voltage. So the electronic system is a nonlinear system.

How to build a deep neural network with loudspeakers?

Figure 2.1.1 Triode circuit

2.2 Second Harmonic Generation System (SHG)

The second harmonic generation system is an optical system and the most complex system exemplified in the article.

  

I am not very familiar with the SHG (Second-Harmonic Generation) optical system, by searching the corresponding literature[2] The basic principle of it can be understood. It utilizes some special molecular physical states to multiply the frequency of the input optical signal to generate the corresponding second harmonic signal.

How to build a deep neural network with loudspeakers?

Figure 2.2.1 Second harmonic generation system

  

For such a physical system that you are not familiar with, how to judge whether it is a linear time-invariant system?

  

Here we need to take advantage of a property of linear time-invariant systems: linear time-invariant systems do not generate new frequency signals.

  

Although it can change the amplitude and phase of different frequency components in the input signal, no new frequency components are created. SHG optical system multiplies all frequency components in the input spectrum to generate new multiplier components, so it does not belong to a linear time-invariant system.

  

Therefore, frequency doubling is the key that the system can be used to complete the physical neural system, and a linear time-invariant optical system cannot construct a physical neural network.

2.3 Speakers

Among the three systems cited in the article, the loudspeaker mechanical vibration system is the most confusing. The system consists of speakers, audio amplifiers, and microphones. The speakers need to be modified.

  

They removed the diaphragm and dust cover of the moving coil speaker to expose the audio coil, glued a metal screw on it, and then fixed a 3.2cm x 3.2cm square, 1mm thick metal sheet made of tantalum. After reading this, you will think that their cowardice operation is to take off their pants and fart, so it is a mystery.

How to build a deep neural network with loudspeakers?

Figure 2.3.1 Mechanical oscillation system made with loudspeaker

  

It was thought that they would like to incorporate nonlinear links into the speaker mechanical system, but the metal screws and tantalum sheets added to the sound coil seem to only increase the inertial mass of the speaker coil. Suppression acts as a low frequency filter. Therefore, the system still belongs to a linear time-invariant system.

  

Below are the speaker input voltage signal, the microphone recording signal, and the signal downsampled digital signal given in the supplementary material of the paper. It can be seen that the audio signal recorded by the microphone is indeed a low-pass smoothing filter of the input signal.

How to build a deep neural network with loudspeakers?

Figure 2.3.2 Speaker input signal, microphone recording signal and downsampled digital signal

  

The figure below is the audio signal collected by the microphone after the input random signal given in the article is embedded with a DC signal (equivalent to a network parameter) with a controllable amplitude and applied to the speaker. In the last picture, it can be seen that there is a linear relationship between the corresponding output signal and the input signal at different times.

How to build a deep neural network with loudspeakers?

Figure 2.3.3 Output signal with input random noise plus controllable DC signal segment noise

  

So the question arises: where is the nonlinear link in this system?

  

What can be thought of now is that downsampling of the microphone signal may change the linear time-invariant characteristics of the system, similar to the role of the Pooling layer in convolutional neural networks.

※ Thesis summary※

This research paper from Cornell University presents a framework for leveraging physical systems for deep network learning and reasoning. In this paper, the three systems exemplified in the article are not linear time-invariant systems. In addition to the more complicated SHG system, the other two systems (transistor, loudspeaker) are so simple that it is attractive to build a system and test the corresponding performance.

  

For students participating in the smart car competition, it may not be necessary to rely on complex high-performance single-chip microcomputers to complete neural network inference in the future, and only a few transistors can be used to complete it.

References

[1]Deep Physical Neural Networks Trained with Backpropogation: https://www.nature.com/articles/s41586-021-04223-6.pdf

[2]Search the corresponding literature: https://www.sciencedirect.com/topics/chemistry/second-harmonic-generation

The Links:   PM10CNA060 6MBP100TEA060 MY LCD