Photonics Research, 2021, 9 (4): 0400B135, Published Online: Apr. 6, 2021  

Interfacing photonics with artificial intelligence: an innovative design strategy for photonic structures and devices based on artificial neural networks

Author Affiliations
1 Department of Mechanical and Industrial Engineering, Northeastern University, Boston, Massachusetts 02115, USA
2 Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts 02115, USA
3 Khoury College of Computer Science, Northeastern University, Boston, Massachusetts 02115, USA
Abstract

Over the past decades, photonics has transformed many areas in both fundamental research and practical applications. In particular, we can manipulate light in a desired and prescribed manner by rationally designed subwavelength structures. However, constructing complex photonic structures and devices is still a time-consuming process, even for experienced researchers. As a subset of artificial intelligence, artificial neural networks serve as one potential solution to bypass the complicated design process, enabling us to directly predict the optical responses of photonic structures or perform the inverse design with high efficiency and accuracy. In this review, we will introduce several commonly used neural networks and highlight their applications in the design process of various optical structures and devices, particularly those in recent experimental works. We will also comment on the future directions to inspire researchers from different disciplines to collectively advance this emerging research field.

1. INTRODUCTION

Novel optical devices consisting of elaborately designed structures have become an extremely dynamic and fruitful research area because of their capability of manipulating light flow down to the nanoscale. Thanks to the advanced numerical simulation, fabrication, and characterization techniques, people are able to design, fabricate, and demonstrate dielectric and metallic micro- and nano-structures with sophisticated geometries and arrangements. For instance, metamaterials and metasurface comprising subwavelength structures, called meta-atoms, can show extraordinary properties beyond those of natural materials [1]. Many metadevices have been reported that offer enormous opportunities for technology breakthroughs in a wide range of applications including light steering [25], holography [69], imaging [1014], sensing [1517], and polarization control [1821].

At present, we can handle most of the photonic design problems by accurately solving Maxwell’s equations using numerical algorithms such as the finite element method (FEM) and finite-difference time-domain (FDTD) method. However, those methods often require plenty of time and computational resources, especially when it comes to the inverse design problem aiming to retrieve the optimal structure from target optical responses and functionalities. In the conventional procedure, we normally start with full-wave simulations of an initial design based on the empirical knowledge and then adjust the geometric/material parameters iteratively to approach the customer-specific requirements. Such a trial-and-error process is time consuming, even for most experienced researchers. The initial design strongly relies on our experience and cognition, and usually some basic structures are chosen, including split-ring resonators [22,23], helix [24], cross [25], bowtie [26], L-shape [2], and H-shape [27,28] structures. Although it is known that a specific type of structures can produce a certain optical response (e.g.,  strong magnetic resonance from split-ring resonators and chiroptical response from helical structures), sometimes the well-established knowledge may limit our aspiration to seek an entirely new design that is suitable for the same applications or even more complicated ones when the traditional approach is not applicable.

Artificial neural networks (ANNs) provide a new and powerful approach for photonic designs [2937]. ANNs can build an implicit relationship between the input (i.e., geometric/materials parameters) and the output (i.e., optical responses), mimicking the nonlinear nerve conduction process in the human body. With the help of well-trained ANNs, we can bypass the complicated and time-consuming design process that heavily relies on numerical simulations and optimization. The functions of most ANN models for photonic designs are twofold: the forward prediction and inverse design. The forward prediction network is used to determine the optical responses from the geometric/material parameters, and it can serve as a substitute for full-wave simulations. The inverse design network aims to efficiently retrieve the optimal structure from given optical responses, which is usually more important and challenging in the design process. One main advantage of the ANN models is the speed. For example, producing the spectrum of a meta-atom from a well-trained forward prediction model only takes a few milliseconds, orders of magnitude faster than typical full-wave simulations based on FEM or FDTD [3840]. In the meantime, the accuracy of the ANN models is comparable with rigorous simulations. For instance, the mean squared loss of spectrum prediction is typically on the order of 103 to 105 [40,41]. Moreover, ANNs can unlock the nonintuitive and nonunique relationship between the physical structure and the optical response, and hence potentially enlighten the researchers with an entirely new class of structures.

Solving the photonic design problem by ANNs is a data-driven approach, which means a large amount of training sets with both geometric/material parameters and optical responses are needed. Once the ANN model works well on the training data set, it can be tested on a test set or real problem. The test and training data sets should be in the same design framework but contain completely different data. The general workflow for a forward prediction network includes four steps. First, a large number of input structures and output optical responses are generated from either simulations or experiments. In most of the published works, the amount of data is on the order of 104. It is noted that the performance of the neural networks depends on both the size and quality of data. To improve the quality of training data, some researchers have applied rule-based optimization methods in the generation of initial training data [42] or attempted to progressively increase the dimension of the training data with the new ones from the trained model [43]. Then we design the ANNs with a certain network structure, such as fully connected layers (FCLs)-based neural networks or convolutional neural networks (CNNs). Next, the training data set is fed into the network, and we optimize the weight and bias for each node. Finally, the well-trained ANNs can be used to predict the response of other input structures that are outside the training and test data sets. As for the inverse design problem, one can simply reverse the input and output and use a similar network structure. However, for some problems, it requires complex methods and algorithms.

This review is devoted to the topic of designing photonic structures and devices with ANNs. We will focus on very recent works on this topic, especially the experimental demonstrations, after introducing the widely used ANNs. The remaining part of the review is organized as follows. In Section 2, we will discuss the basic FCLs and their application in the prediction of design parameters. Then, in Section 3, we will focus on the CNNs that are used in the retrieval of much more complicated structures described by pixelated images. In Section 4, other useful and efficient hybrid algorithms by combining deep learning and conventional optimization methods for photonic design will be discussed. In the last section, we will conclude the review by discussing the achievements, current challenges, and outlooks in the future.

2. PHOTONIC DESIGN BY FULLY CONNECTED NEURAL NETWORK

2.1 A. Introduction of FCLs

Fig. 4. (a) Left: Architecture of the proposed neural network for nonlinear layers. Right: Predicted, simulated, and measured transmission spectra of two gold nanostructures under different polarization conditions. (b) Left: Illustrations of MANN used for reconstruction of 3D vectorial field. Right: Experimental approach and characterizations of 3D vectorial holography based on a vectorial hologram. (c) Left: Schematic of a deep-learning-enabled self-adaptive metasurface cloak. Right: Demonstration of the self-adaptive cloak response subject to random backgrounds and incidence with varied angles and frequencies. (a) is reproduced from Ref. [54] with permission; (b) is reproduced from Ref. [63] with permission; (c) is reproduced from Ref. [64] with permission.

下载图片 查看所有图片

The training process of the fully connected neural network is quite straightforward. The training set contains an input vector X and an output vector Y (Y can be a vector of complex/real values for regression problems or vector of discrete integers as labels for classification problems). The performance of the model is highly dependent on the quantity and quality of the training data set. During the training process, the network first takes the vector X as input and calculates the output Y^ through the tensor operation and activation from left to right. Then a loss function (or cost function) is defined and needs to be minimized in order to calculate the performance of the neural network. For instance, we can use mean-squared-error (MSE) [loss(Y,Y^)=(YY^¯)2] for regression problems and cross-entropy loss [loss(Y,Y^)=YT·log(Y^)] for classification problems. The next step, the backpropagation of error, is the most critical part of ANNs. In the ANN, there are a series of learnable parameters to be optimized, i.e., the weight and bias of each layer. We can then derive the partial derivative of the loss with respect to each parameter loss(Y,Y^)weight,loss(Y,Y^)bias. To calculate those values, we need to apply the chain rule layer by layer from the end of the ANN to the front. This is why the process is called “backpropagation.” Finally, all the parameters are optimized by the stochastic gradient descent method: {weight=weightlr·loss(Y,Y^)weightbias=biaslr·loss(Y,Y^)bias.Here the learning rate lr is a hyperparameter that is usually controlled by the user and is not learnable. The training process is iterated until the loss is minimized. Different learning rates would result in different situations: a large learning rate will cause issues for the model to converge, while a small learning rate will increase the training time of the model. Therefore, the general approach is to assign a large learning rate at the beginning of the training, and after the model is trained for several epochs, the learning rate can be tuned to a smaller value.

2.2 B. Design Parameterized Structure by FCLs-Based ANNs

FCLs have been extensively adopted to design optical devices, especially in the field of metasurface and nanostructure design. In early 2018, D. Liu et al. introduced, for the first time, a tandem network architecture for the inverse design problem [46]. There is one fundamental challenge in training ANNs for inverse design, arising from the fact that very similar optical responses may be achieved by different structures. Such nonunique one-to-many mapping makes the neural network hard to converge if conflicting instances with almost the same optical responses but different geometric labels exist in the training data set. Mathematically, the gradient of the functions to be approximated by the ANNs is extremely large at this data point. To tackle this challenge, the authors proposed a network structure consisting of a pretrained forward model and inversed design FCLs, which is illustrated in the top panel of Fig. 2(a). The network structure avoids direct comparison between the retrieved geometric parameters. Instead, it compares the predicted spectra of the retrieved structures. Therefore, the prediction of the network will converge to only one structure that can satisfy the required spectra, solving the one-to-many problem in the inverse design. The authors used the tandem neural network to design dielectric multilayers composed of SiO2 and Si3O4. The results are plotted in the bottom panel of Fig. 2(a), in which the transmission spectra of the retrieved structure (green dashed line) can well match the desired Gaussian-shaped spectra (blue solid lines).

(a) Top: Schematic of the tandem neural network and SiO2 and Si3O4 multilayers. Bottom: Two examples of target spectra (blue solid lines) and simulated spectra of retrieved structures (green dashed lines). The target spectra are in a Gaussian shape. (b) Left: Predicted (open circles) extinction cross section of the electric dipole (red) and magnetic dipole (black) of core-shell nanoparticles. The solid lines are target responses. Right: Simulated extinction spectra and the corresponding electric field distribution of core-shell nanoparticles. (c) Top: Simulation result and inverse design prediction of the scattering cross section of core-shell nanoparticles. Bottom: Runtime comparison between the conventional method and neural network. (d) Top: A multilayer structure composed of Si3N4 and graphene. Bottom: Optical response of the designed nanostructures (with either low/near-unity absorbance in graphene) under the excitation of s-polarized light. (a) is reproduced from Ref. [46] with permission; (b) is reproduced from Ref. [47] with permission; (c) is reproduced from Ref. [38] with permission; (d) is reproduced from Ref. [48] with permission.

(a) Top: Schematic of the tandem neural network and SiO2 and Si3O4 multilayers. Bottom: Two examples of target spectra (blue solid lines) and simulated spectra of retrieved structures (green dashed lines). The target spectra are in a Gaussian shape. (b) Left: Predicted (open circles) extinction cross section of the electric dipole (red) and magnetic dipole (black) of core-shell nanoparticles. The solid lines are target responses. Right: Simulated extinction spectra and the corresponding electric field distribution of core-shell nanoparticles. (c) Top: Simulation result and inverse design prediction of the scattering cross section of core-shell nanoparticles. Bottom: Runtime comparison between the conventional method and neural network. (d) Top: A multilayer structure composed of Si3N4 and graphene. Bottom: Optical response of the designed nanostructures (with either low/near-unity absorbance in graphene) under the excitation of s-polarized light. (a) is reproduced from Ref. [46] with permission; (b) is reproduced from Ref. [47] with permission; (c) is reproduced from Ref. [38] with permission; (d) is reproduced from Ref. [48] with permission.

Subsequent works have further confirmed the good performance of the tandem network architecture. For instance, S. So et al. used a similar ANN structure to design core-shell structures (with three layers) that support strong electric and magnetic dipole resonances [47]. The ANN was built to learn the correlation between the extinction spectra and core-shell nanoparticle designs, including the material information and shell thicknesses. In Fig. 2(b), the predicted (open circles) extinction cross sections of the electric dipole (red) and magnetic dipole (black) of core-shell nanoparticles are compared with the target responses (solid lines). It is clear that both the electric dipole and magnetic dipole spectra of the designed core-shell nanoparticles fit well with the expectations. J. Peurifoy et al. also studied the inverse design with ANNs for multilayered particles (up to eight layers), with a focus on the scattering spectra [38]. The FCLs were used in both forward prediction of scattering cross-section spectra and the inverse design from the spectra. Using a model trained with 50,000 training data, they can achieve a mean relative error of around 1%. One example is shown in the top panel of Fig. 2(c), in which the result from the neural network is compared with numerical nonlinear optimization as well as the desired spectra. The comparison demonstrates that the neural network model performs better in this design problem. Moreover, the running time of the ANNs-aided inverse design is shortened by more than 100 times in comparison with full-wave simulation as demonstrated in the bottom panel of Fig. 2(c). This result clearly shows the advantage of ANNs in terms of efficiency.

Besides the tandem network, other approaches have been introduced to improve the performance of the FCLs-based neural network. In 2019, Y. Chen et al. employed an adaptive batch-normalized (BN) neural network, targeting the smart and quick design of graphene-based metamaterials as illustrated in the top panel of Fig. 2(d) [48]. Specifically, a layer using an adaptive BN algorithm is placed before each hidden layer to overcome the limitation of BN in small sampling spaces. In the adaptive BN network, it takes activation hi of each neuron in a minibatch B, batch normalization parameters γ, δ, and adaptive parameters α, β as the inputs. The outputs of the system are the new activation h^i for each neuron. The authors tested their method by deriving the thickness of each Si3O4 layer in the structures. Prediction accuracy of over 95% was achieved. The bottom panel of Fig. 2(d) plots the optical responses of two different examples with varied absorbance in graphene, showing excellent accordance between the target and design responses.

In parallel, T. Qiu et al. proposed a new method, named REACTIVE, to conduct the inverse design based on reflection spectra [39]. The authors applied this method to inversely design a metasurface whose unit cell can be described as a matrix of 8×8 as shown in the left panel of Fig. 3(a). The input data sets are preprocessed by Gaussian smoothing and then transformed by a discrete cosine transform that can be modeled as F(u)=c(u)i=0N1f(i)cos[(2i+1)π2Nu],c(u)={1/N,    u=02/N,    u0.In this model, the S-parameters of the desired structure are the needed output. Once the S-parameters are generated by the trained deep learning network, the matrix of the designed metasurface will be automatically generated by REACTIVE. In Fig. 3(a), the right panel shows the results from REACTIVE, including S-parameter S11 (i.e., reflection coefficient) and absorptivity, which perfectly match the design targets.

(a) Left: Schematic illustration of the metasurface, the unit cell, and matrix encoding method. Right: Predicted S-parameter and absorptivity with the REACTIVE method. (b) Illustration of the neural network architecture consisting of BaseNet and TransferNet. (c) The trend of spectrum error when n layers are transferred to the TransferNet and the predicted transmission spectra for two examples. (a) is reproduced from Ref. [39] with permission; (b) and (c) are reproduced from Ref. [49] with permission.

(a) Left: Schematic illustration of the metasurface, the unit cell, and matrix encoding method. Right: Predicted S-parameter and absorptivity with the REACTIVE method. (b) Illustration of the neural network architecture consisting of BaseNet and TransferNet. (c) The trend of spectrum error when n layers are transferred to the TransferNet and the predicted transmission spectra for two examples. (a) is reproduced from Ref. [39] with permission; (b) and (c) are reproduced from Ref. [49] with permission.

Due to the data-driven nature of deep learning, the performance of a well-trained ANN highly relies on the training set, and the prediction loss is likely to increase as the inputs deviate from the training set. Therefore, a challenge in the deep-learning-aided inverse design lies in extending the capability of ANNs to an alternated data set that is very different from the training data. Usually, one needs to generate an entirely new training set for similar but different physical scenarios. In this context, reducing the demand for computational data is an efficient way to accelerate the training of deep learning models. Y. Qu et al. proposed a transfer learning method, which is schematically illustrated in Fig. 3(b), to migrate knowledge under different physical scenarios [49]. The prediction accuracy is significantly improved, even with a much smaller data set for new tasks. Two sets of ANNs are involved in this work. The first one, named BaseNet, is trained with initial data. The second one, called TransferNet, copies the first n layers from the BaseNet, and the entire system is fine-tuned simultaneously. The authors first transferred the spectra prediction task from a 10-layer film to an 8-layer film, where the source and target task were trained with 50,000 and 5000 examples, respectively. Comparing to direct learning, the result is good enough since the error drops when n increases as shown in Fig. 3(c). The TransferNet is applicable for different structures, ranging from multilayer nanoparticles to multilayer films. Based on the model, a multitask learning scheme was studied, which combined the learning for multiple tasks at the same time. It was shown that the neural network in conjunction with the transfer learning method can produce more accurate predictions.

The FCLs have also been utilized in reinforcement learning [5053], which is another hot area of machine learning, for the inverse design problem. Reinforcement learning has already achieved great performance in robotics, system control, and game-playing (AlphaGo). Instead of predicting the optimized geometry, the ANNs in reinforcement learning behave as an iterative optimization method. In each step, an action to optimize the geometry parameters is predicted. For instance, the action can be increasing or decreasing several parameters by a certain value. The advantage of this approach is that it can be adaptive to specific problems, and it can provide guidance for conventional trial-and-error optimization methods.

Fig. 4. (a) Left: Architecture of the proposed neural network for nonlinear layers. Right: Predicted, simulated, and measured transmission spectra of two gold nanostructures under different polarization conditions. (b) Left: Illustrations of MANN used for reconstruction of 3D vectorial field. Right: Experimental approach and characterizations of 3D vectorial holography based on a vectorial hologram. (c) Left: Schematic of a deep-learning-enabled self-adaptive metasurface cloak. Right: Demonstration of the self-adaptive cloak response subject to random backgrounds and incidence with varied angles and frequencies. (a) is reproduced from Ref. [54] with permission; (b) is reproduced from Ref. [63] with permission; (c) is reproduced from Ref. [64] with permission.

下载图片 查看所有图片

In addition to spectrum prediction [55,56], the FCLs-based ANNs have also been used in the inverse design to realize other functionalities and benefit real-world applications [5762]. Holographic images, for example, can be optimized by ANNs to achieve a wide viewing angle and three-dimensional vectorial field as recently demonstrated by H. Ren et al. [63]. They used a network named multilayer perceptron ANN (MANN), which was composed of an input layer fed with an arbitrary three-dimensional (3D) vectorial field, four hidden layers, and an output layer for the synthesis of a two-dimensional (2D) vector field. There are 1000 neurons within each hidden layer. The scheme of this ANN is shown in the top left panel of Fig. 4(b). The authors showed that an arbitrary 3D vectorial field can be achieved with a 2D vector field predicted by the well-trained model. A 2D Dirac comb function was then applied to sample the desired image. Subsequently, digital holography, calculated from the desired image, was combined with the 2D vector field. This process can be visualized in the right panel of Fig. 4(b). With a split-screen spatial light modulator that independently controls the amplitude and phase orthogonal circularly polarized light, any desired 2D vector beam can be generated. As a result, the experimentally measured image from the hologram can show four different 3D vectorial fields in different regions as presented in the bottom left panel of Fig. 4(b). The authors experimentally realized an ultrawide viewing angle of 94° and high diffraction efficiency of 78%. The demonstrated 3D vectorial holography opens avenues to widespread applications such as holographic display as well as multidimensional data storage, machine learning microscopy, and imaging systems.

Another exciting work enabled by ANNs is a self-adaptive cloak that can respond within milliseconds to ever-changing incident waves and surrounding environments without human intervention [64]. A pretrained ANN was adopted to achieve the function. As schematically illustrated on the left panel of Fig. 4(c), at the surface of the cloak, a single layer of active meta-atoms was applied, and the reflection spectrum of each varactor diode was controlled by DC bias voltage independently. To achieve the invisibility cloak function, the bias voltage was determined by the pretrained ANN with the incident wave characteristics (such as the incident angle, frequency, and reflection amplitude) as the input. The temporal response of the cloak was simulated, and an extremely fast transient response of 16 ms can be observed in the simulation. The authors then conducted the experiment, where a p-polarized Gaussian beam illuminated at an angle θ on a chameleon object covered by the cloak. Two detectors were used to extract the signals from the background and the incident wave to characterize the cloak. The right panel of Fig. 4(c) shows the experimental results at two incident angles (9° and 21°) and two frequencies (6.7 and 7.4 GHz). The magnetic field distribution in the case of a cloaked object is similar to that when only the background is present, while it is distinctly different from the bare object case. Differential radar cross-section (RCS) measurement further confirmed the performance of the cloak.

3. RETRIEVE COMPLEX STRUCTURES BY CONVOLUTIONAL NEURAL NETWORKS

3.1 A. Introduction of CNNs

The desired designs and structures are oftentimes hard to parameterize, especially when the structure of interest contains many basic shapes [41,65] or is freeform [66,67]. In some cases, we need to deal with complex optical responses as the input [68]. Therefore, converting the structure to a 2D or 3D image is usually a good approach in these studies. Moreover, it can offer much larger degrees of freedom in the design process. However, preprocessing is required to handle the image input if we still want to use the FCLs-based model. Reshaping the image to a one-dimensional vector and applying feature extraction with linear embeddings, such as principal component analysis and random projection, are two effective ways to preprocess the image so that the input is compatible with the FCLs. However, the performance is usually not satisfactory. The reason is that these conversions will either break down the correlation of the nearest pixels in the vertical direction within an individual image or miss part of the information describing the integrality of the whole image. An extremely large dimension of the input is another big issue, which will increase the number of connections between layers quadratically. For conventional parameter input, the input dimension is usually a few tens or hundreds, while for a vectorized image, even an image with 64×64 pixels will result in a 4096-dimensional input vector. CNNs are very suitable to deal with such circumstances. CNNs accept an image input without preprocessing, and then several filters move along the horizontal and vertical directions of the image to extract different features. Each filter has a certain weight to perform a convolutional operation at each subarea of the image, that is, the summation of the pointwise multiplication between the value of the subarea and the weight of the filter.

To explain the function of CNNs in detail, let us assume an input (C,X,Y). Here C is the number of channels of an image, while X and Y are the number of pixels in horizontal and vertical directions, respectively. For binary or gray images C=1, and for RGB images C=3. Then each CNN consists of a weight tensor that has Nf filters with the dimension (C,Xf,Yf), meaning each filter is built with C channels of an Xf×Yf matrix (usually a 3×3 or 5×5 matrix is used). The CNN is normally built with three operations, including convolution, activation, and pooling (sometimes a batch normalization will be added). Figure 5(a) illustrates the convolution operation (consider C=1). Each filter is initially placed on the top left Xf×Yf subarea of each image. The pointwise multiplication of the two Xf×Yf matrices is calculated and summed to a single value in the output image. Then the filter moves a certain number of pixels (known as “stride”) and repeats the process until the whole image is mapped to the output. The dimension of the output is usually smaller than that of the input. However, the output dimension can be easily tuned by adding paddings to the input images, which expand the dimension of the input image with zero pixels. In this example where one round of padding is added, the output image will have the same dimension as the input (stride equals 1, and the filter dimension is 3×3). The activation function plays a significant role in the CNNs for the same reason as FCLs, and we can choose similar functions as previously mentioned. A pooling layer helps to reduce the dimension of the image. It usually maps a 2×2 (or 3×3) area in the input to a single value in the output according to the maximum or mean value of the four (or nine) values, as represented in Fig. 5(b). The entire workflow for conventional CNNs is shown in Fig. 5(c). The inputs are several images, and each represents a certain design of structure. The inputs pass through layers of CNN with three operations, and the size of the tensor gradually shrinks while the number of channels expands. The output now becomes a 1D vector. It can be regarded as the features extracted from the image, and the features are fed into the FCLs to predict the final output that is related to the optical response. The MSE and cross-entropy loss discussed in the previous section can also serve as the loss function in many cases of CNNs. The loss calculated by comparing the predicted and true response undergoes backpropagation through all layers to update the parameters. We want to emphasize that other loss functions, such as Kullback–Leibler divergence [41] and mean absolute error [69], can also be used in ANNs, depending on the physical constraints and the expected functions of the ANNs.

(a) Schematic of the convolution operation, in which the filters map the subarea in the input image to a single value in the output image. (b) Schematic of the pooling operation, in which the subarea in the input image is pooled into a single value in the output according to the maximum or mean value. (c) The workflow of a conventional CNN. The input images pass through several CNNs, and then the extracted features are passed into the FCLs to predict the response (e.g., transmission, reflection, and absorption spectra).

(a) Schematic of the convolution operation, in which the filters map the subarea in the input image to a single value in the output image. (b) Schematic of the pooling operation, in which the subarea in the input image is pooled into a single value in the output according to the maximum or mean value. (c) The workflow of a conventional CNN. The input images pass through several CNNs, and then the extracted features are passed into the FCLs to predict the response (e.g., transmission, reflection, and absorption spectra).

3.2 B. Design Complex Photonic Structures by CNNs

The CNNs have greatly expanded the design space of the possible structures that one can explore. For example, plasmonic structures have been extensively studied over the past decades, due to their unique features in optics and photonics and far-reaching impacts on other disciplines [7074]. By carefully designing the geometry and composite materials, we can confine light into a sub-10 nm dimension with the local field amplified by 10–1000 times at the resonant wavelengths. Therefore, building a relationship between the design of the plasmonic structure and the corresponding optical responses is of great interest. In the work of I. Sajedian et al. published in 2019, the authors combined the CNNs with the recurrent neural networks (RNNs) to predict the absorption spectra of complex plasmonic structures in the near-infrared region [40]. The CNNs helped to extract the features from the pixelated structures, and the RNNs with gated recurrent unit layers were used to predict the spectra. The model showed an MSE loss lower than 104 when training with 100,000 data. The authors have also examined the output after each layer to investigate how the higher-level features could be extracted as the model goes deeper. In the same year, S. So et al. reported the use of conditional deep convolutional generative adversarial networks (cDCGANs) to retrieve silver plasmonic structures with six basic shapes, such as circle, square, and cross, from given reflection spectra under linearly polarized illumination [69]. The generative adversarial networks (GANs) consist of a generator network and a discriminator network [65,75]. The training process for the GANs-based model is a competition between the generator and the discriminator. The generator generates structures from the input spectrum and a noise vector, trying to fool the discriminator that the generated structure is a rational structure according to the knowledge learned from the training set. The noise vectors are sampled from a conditional distribution, which is dependent on the prescribed spectra in this case. The discriminator tries to discriminate the “fake” structures generated from the generator and the “true” structures in the training data set. In the beginning, each input structure is pixelated into a 64×64 image, and the CNNs are used to extract the features of the images in both networks. After running several epochs of the training process, even the optimized discriminator can hardly distinguish the difference between “fake” and “true” inputs, since the generator can generate extremely similar structures to the desired ones, resulting in a good model for inverse design. As shown in the top panel of Fig. 6(a), the simulated spectra of the retrieved structures (red line) agree well with the desired spectra (black line), which are either simulated with an existing structure (first row) or randomly generated with a Lorentzian shape (second row). The overall accuracy is noticeable, reaching a 0.0322 mean-absolute error among 12 test samples after the model is trained with 10,150 training data. The authors also showed that the model can inversely design different structures (but are still within the basic shape groups), while the spectra meet the target as illustrated at the bottom of Fig. 6(a). The emergence of structures different from the ground truths can be attributed to the one-to-many mapping issue that we have discussed in the introduction section.

(a) Top: Examples of cDCGAN-suggested images and the simulation results. Bottom: Entirely new structures suggested by the cDCGAN for desired spectra. (b) Top: The proposed deep generative model for metamaterial design, which consists of the prediction, recognition, and generation models. Bottom: Evaluation of the proposed model. The desired spectra either generated with user-defined function or simulated from an existing structure are plotted in the first column. The reconstructed structures with the simulated spectra are plotted in the second and third columns. (c) Left: Flowchart of the VAE-ES framework. Right: Test results of designed photonic structures from the proposed model and the simulated spectra. (a) is reproduced from Ref. [69] with permission; (b) is reproduced from Ref. [41] with permission; (c) is reproduced from Ref. [79] with permission.

(a) Top: Examples of cDCGAN-suggested images and the simulation results. Bottom: Entirely new structures suggested by the cDCGAN for desired spectra. (b) Top: The proposed deep generative model for metamaterial design, which consists of the prediction, recognition, and generation models. Bottom: Evaluation of the proposed model. The desired spectra either generated with user-defined function or simulated from an existing structure are plotted in the first column. The reconstructed structures with the simulated spectra are plotted in the second and third columns. (c) Left: Flowchart of the VAE-ES framework. Right: Test results of designed photonic structures from the proposed model and the simulated spectra. (a) is reproduced from Ref. [69] with permission; (b) is reproduced from Ref. [41] with permission; (c) is reproduced from Ref. [79] with permission.

W. Ma et al. also demonstrated a probabilistic approach for the inverse design of plasmonic structures in 2019 [41]. In this work, the structure of interest was a metal-insulator-metal (MIM) structure, with geometries pixelated into 64×64 images as training data. The authors focused on the co- and cross-polarized reflection spectra in the mid-infrared region from 40 to 100 THz. The developed neural network is shown at the top of Fig. 6(b), which comprises the prediction, recognition, and generation models. Again, the input geometry passes through the CNNs to extract the features from the image. Then the prediction model with FCLs can automatically predict the reflection spectra from the geometry features. For the inverse design part, the authors incorporated a variational auto-encoder (VAE) structure [76,77], which is a probabilistic approach, in the model. It works in the following way. First, the recognition network encodes both the structures and corresponding spectra into a latent space with a standard Gaussian prior distribution. While in the generation model, the network takes the desired spectra together with a latent variable randomly sampled from the conditional latent distribution to reconstruct one geometry. Here, the three models are trained together in an end-to-end manner. The well-trained model can not only predict the spectra from the given structure, serving as a powerful alternative for numerical simulation, but also reconstruct multiple structures from user-defined spectra. The bottom part of Fig. 6(b) shows the performance of the model trained with 30,000 data for spectral prediction and the inverse design for both user-defined spectra (first row) and spectra from a test structure (second row). The first column in the figure shows the target spectra. In the case where a test structure is used to generate the spectra, the predicted spectrum from the prediction model is also plotted as a scatter plot, which shows great coincidence with the spectra from full-wave simulation (solid lines). In the second and third columns, two examples of the geometry from the inverse design model and their simulated spectra are depicted. One can find that even though the structures are very different from each other and also from the ground truth, the spectra resemble the target ones. The authors further expanded the basic shapes by transfer learning to enable the reconstruction of a wide range of geometry groups. The generality of the model was exemplified by the designs of double-layer chiral metamaterials. Very recently, W. Ma and Y. Liu developed a semi-supervised learning strategy to accelerate the training data generation process, the most time-consuming part of the deep-learning-aided inverse design [78]. In addition to the labeled data that have both the geometries of structures and simulated spectra, the unlabeled data with only the geometry information are included. Unlike the labeled data where simulated spectra can be the input in the inverse design model, the predicted spectra of the unlabeled data are used as input to reconstruct the geometry. Without numerical simulation, the unlabeled data can be generated several orders of magnitude faster. They also help to dramatically lower the training loss by 10%–30% for the model trained with the same number of labeled data.

Z. Liu et al. introduced a hybrid approach by combining the VAE model and the evolution strategy (ES) [79]. The framework of the hybrid model is shown on the left of Fig. 6(c). In each iteration, a generation of latent vectors v is fed into the model and a structure is reconstructed. Then a well-trained simulator is used to predict the transmittance spectra of the structures, and the fitness score is calculated. If the criteria are not satisfied yet, the ES will perform reproduction and mutation with the mutation strength m to create a new generation of the latent vectors. Such a process is repeated until the criteria are met. The details of ES will be discussed in the genetic algorithm part in the next section. The right panel of Fig. 6(c) shows the performance of the inverse design model. The solid line and dashed line are the simulated spectra of the test pattern (orange) by finite element method and the reconstructed pattern (black) from the hybrid model, respectively. All the works in Fig. 6 solve the one-to-many mapping issue with a probabilistic approach like VAEs and GANs, where a randomly sampled parameter or vector is combined with the desired optical response as the input to reconstruct the structure. It enables the ANNs to explore the full physical possibility of the design space to produce sophisticated structures for novel functions.

In 2019, Q. Zhang et al. demonstrated the digital coding metasurface using CNNs [80]. They explored different meta-atoms, each in the size of 8 mm and with 16×16 pixels, to control the reflection phase. The CNNs model was built upon residual learning blocks and 70,000 training patterns. After training, the model can precisely predict the reflection phase; 90.05% of the test samples exhibited a deviation of less than 2° in the 360° phase range. Subsequently, the model was used for the inverse design of meta-atoms with a prescribed phase response. More specifically, the goal was to create a 1-bit coding with two meta-atoms such that the reflection phase of the incident x- (px) and y- (py) polarized light satisfy pxipyi=θ,|py2py1|=180°.This means that the two meta-atoms should have the same reflection phase difference θ between cross-polarizations, and the relative phase between the two meta-atoms is maximized to 180°. With varied phase difference θ, eight different 1-bit coding elements (with two structures for each coding) were predicted with 45° step in θ. One example of the eight elements is plotted on the left of Fig. 7(a). By carefully combining the phase profile on a metasurface consisting of 16 designed units, the authors demonstrated the independent manipulation of the phase for orthogonal polarizations. As one example of potential applications, the authors fabricated several dual- and triple-beam coding metasurfaces that can deflect light with different polarizations into different angles at 10 GHz. The measurement was performed in the microwave chamber with a horn antenna as the excitation source. On the right of Fig. 7(a), we can find excellent agreement between the measured far-field scattering patterns and the simulated ones.

(a) Left: One example of 1-bit coding elements with regular phase differences. Right: Comparison of the simulated and measured results of the dual- and triple-beam coding metasurfaces. (b) Schematic of the proposed 3D CNN model to characterize the near-field and far-field properties of arbitrary dielectric and plasmonic nanostructures. (c) Left: Sketch of the nanostructure geometry and the 1D CNN-based ANNs. Right: Training convergence and readout accuracy of the ANNs. (d) Left: The workflow of designing the DMD pattern for light control through scattering media with ANNs. Right: The structures of the FCLs-based single-layer neural network and the CNNs, together with the simulated and measured results for the focusing effect. (a) is reproduced from Ref. [80] with permission; (b) is reproduced from Ref. [81] with permission; (c) is reproduced from Ref. [86] with permission; (d) is reproduced from Ref. [87] with permission.

(a) Left: One example of 1-bit coding elements with regular phase differences. Right: Comparison of the simulated and measured results of the dual- and triple-beam coding metasurfaces. (b) Schematic of the proposed 3D CNN model to characterize the near-field and far-field properties of arbitrary dielectric and plasmonic nanostructures. (c) Left: Sketch of the nanostructure geometry and the 1D CNN-based ANNs. Right: Training convergence and readout accuracy of the ANNs. (d) Left: The workflow of designing the DMD pattern for light control through scattering media with ANNs. Right: The structures of the FCLs-based single-layer neural network and the CNNs, together with the simulated and measured results for the focusing effect. (a) is reproduced from Ref. [80] with permission; (b) is reproduced from Ref. [81] with permission; (c) is reproduced from Ref. [86] with permission; (d) is reproduced from Ref. [87] with permission.

CNNs are widely applied in 2D image processing. The significance of CNNs is attributed to their ability to keep the local segment of the input as a whole, which can theoretically work in an arbitrary dimension. Taking advantage of this property, P. R. Wiecha and O. L. Muskens built a model with 3D CNNs to predict the near-field and far-field electric/magnetic response of arbitrary nanostructures [81]. They pixelated the dielectric or plasmonic nanostructure of interest into a 3D image and fed the image into several layers of 3D CNNs. Then an output 3D image with the same size as the input was predicted, representing the electric field under a fixed wavelength and polarization in the same coordination system as shown in Fig. 7(b). The residual connections and shortcut connections in the network are known as the residual learning [82] and U-Net [83] blocks, which can help to stabilize the gradient of the networks and make the network deeper without compromising its performance [84,85]. From the predicted near-field response, other physical quantities, such as far-field scattering patterns, energy flux, and electromagnetic chirality, can then be deduced. The authors studied two cases: 2D gold nanostructures with random polygonal shapes and 3D silicon structures consisting of several pillars. Each scheme was trained by simulation data of 30,000 distinct geometries. With the well-trained model, the authors reproduced several nano-optical effects from the near-field prediction from the 3D CNNs, like antenna behavior of gold nanorods and Kerker-type scattering of Si nanoblocks. The model can potentially serve as an extremely fast tool to replace the current full-wave simulation methods, with the trade-off of slightly decreased accuracy.

In parallel, a one-dimensional (1D) CNN was also introduced to analyze the scattering spectra of silicon nanostructures for optical information storage as demonstrated by P. R. Wiecha et al. in 2019 [86]. The authors used Si nanostructures to store the bit information with high density as shown in the left panel of Fig. 7(c). The nanostructure was divided into N parts. If a certain part contained a silicon block, the particular bit was defined as “1;” otherwise it was “0.” Therefore, an N-bit information storage unit was created. The readout of the information encoded in the nanostructure was through far-field measurement. Here, the dark-field spectra under x- and y-polarized light in the visible range were chosen to be the measured information. The 1D CNNs together with FCLs were used to analyze the spectra, where the input of the classification problem was the scattering spectra and the output was the index of the class number among the total 2N classes for N bits, representing the bit sequence. The network was trained with experimentally measured dark-field spectra of 625 fabricated nanostructures for each geometry. The model trained after 100 epochs can show quasi-error-free prediction with accuracy higher than 99.97% for the 2-bit to 5-bit (or even 9-bit) geometries as demonstrated in the right panel of Fig. 7(c). The authors further showed that the input information can be greatly reduced by feeding the network with only a small spectral window around 100 nm or even several discrete data points on the spectra, while the effect on the accuracy was neglectable. Finally, the authors managed to retrieve the stored information from the RGB value of the dark-field color image of the nanostructures. This new approach can reduce the complexity and equipment cost of the readout process and at the same time promises a massively parallel retrieval of information.

CNNs are not always the best choice for image inputs as found by A. Turpin et al. in 2018 [87]. The scheme of this work is shown on the left of Fig. 7(d). They studied the speckle of the illuminated digital micromirror device (DMD) pattern after light passed through a layer of scattering material like a glass diffuser of multimode fibers. They intended to inversely design the required DMD pattern for an output speckle to form a certain image. The authors built two models by a single FCL and multilayer CNNs. The right panel of Fig. 7(d) presents the result of the inverse designs for the desired Gaussian beam outputs based on the two models. We can find that the measured results of the single FCL look better than those of the multilayer CNNs. Quantitatively, both of the models can achieve a signal-to-noise ratio larger than 10. However, the enhancement metric is η=32 for the first model and only 3.6 for the second model, where η is defined as the intensity at the generated focal point divided by the mean intensity of the background speckle. Therefore, the authors concluded that in this particular application, CNNs can reduce the number of network parameters by almost 80% compared to the single FCL, but at the cost of a worse performance when the used training data have a similar number. The well-trained model can then be used to predict the required illumination pattern with varied output images. In this way, the authors achieved a dynamic scan of the focal point by manipulating the input illumination with a high frame rate of 22.7 kHz.

4. OTHER INTELLIGENT ALGORITHMS FOR PHOTONIC DESIGNS

There are other well-developed computational methods and algorithms that can be applied for the inverse design with satisfactory performance in specific circumstances. One of the most popular methods is the genetic algorithm [88,89], which is inspired by Charles Darwin’s natural evolution theory. As previously discussed [79], in the design toward a target response, a group of initial designs is created either randomly or empirically. The performance of the first generation of “species” is tested and compared to the target response, and a fitness score based on the comparison is calculated. The algorithm will select several “species” in the current generation that has the highest fitness score. Then reproduction combining the information of two or more designs and mutation that adds random noise to the design is performed to generate the next generation of species. The process is repeated until all or most of the species in the new generation have good fitness scores. This algorithm was already applied to photonic design problems a decade ago and achieved great success [9094]. Recently, Z. Liu et al. published their work that integrated the genetic algorithm with ANNs [95]. They studied “meta-molecules” consisting of multiple meta-atoms that can realize polarization conversion and anomalous light deflection as shown on the left of Fig. 8(a). The model is composed of a compositional pattern-producing network (CPPN), which is used to decode the 2D patterns from a latent variable, and a cooperative coevolution algorithm (CC) to identify a set of vectors in the latent space. The CPPNs take the coordinate tuple (xi,yi,ri) one at a time together with a latent vector v, which controls the shapes of the patterns, and assemble the predictions from the whole input as a pattern. The CC then performs the genetic algorithm with the fitness score calculated based on the output polarization state, the ellipticity, and the phase and intensity of the electric field. The authors first trained a neural network simulator with the response from 8000 meta-atoms in different shapes. This simulator can be adopted in the CC to greatly reduce the time of fitness score computation. The simulator can achieve predictions of real and imaginary parts of spectra with an accuracy above 97%. The authors designed and fabricated meta-molecules comprising two (or eight) meta-atoms to implement polarization conversion under linear polarization as well as anomalous light deflection under circular polarization. The simulated and measured results of polarization conversion are plotted in the right panel of Fig. 8(a), showing excellent agreement with the target.

(a) Left: Illustration of meta-molecules. Right: Fabricated samples and the measured and simulated results of polarization conversion. (b) Top: Schematic of a silicon metagrating that deflects light to a certain angle. Bottom: The proposed conditional GLOnet for metagrating optimization. (c) Top: Schematic of structure refinement and filtering for the high-efficiency thermal emitter. Bottom: The efficiency, emissivity, and normalized emission of the well-optimized thermal emitter. (d) Top: Illustration of the unit cell consisting of three metallic patches connected via PIN diodes and a photograph of the fabricated metasurface. Bottom: Experimental results for reconstructing human body imaging. (a) is reproduced from Ref. [95] with permission; (b) is reproduced from Ref. [100] with permission; (c) is reproduced from Ref. [42] with permission; (d) is reproduced from Ref. [104] with permission.

(a) Left: Illustration of meta-molecules. Right: Fabricated samples and the measured and simulated results of polarization conversion. (b) Top: Schematic of a silicon metagrating that deflects light to a certain angle. Bottom: The proposed conditional GLOnet for metagrating optimization. (c) Top: Schematic of structure refinement and filtering for the high-efficiency thermal emitter. Bottom: The efficiency, emissivity, and normalized emission of the well-optimized thermal emitter. (d) Top: Illustration of the unit cell consisting of three metallic patches connected via PIN diodes and a photograph of the fabricated metasurface. Bottom: Experimental results for reconstructing human body imaging. (a) is reproduced from Ref. [95] with permission; (b) is reproduced from Ref. [100] with permission; (c) is reproduced from Ref. [42] with permission; (d) is reproduced from Ref. [104] with permission.

Another widely used optimization algorithm for the inverse design is gradient-based topology optimization [21,96103]. In the optimization process, the design space is discretized into pixels whose properties (i.e., refractive index) can be represented by a parameter set p. The parameter set will be optimized for a prescribed target response by maximizing (minimizing) a user-defined objective function F. Starting from an initial parameter set, both a forward simulation and an adjoint simulation are performed to calculate the gradient of the objective function F/pi with respect to each parameter. Then the parameters are updated according to the gradient ascent (descent) method. This iterative process is continued until the objective function is well optimized. Taking advantage of the topology optimization, J. Jiang et al. presented a global optimizer for highly efficient metasurfaces that can deflect light to desired angles [100]. As illustrated in the top panel of Fig. 8(b), the metagrating in one period is divided into 256 segments, and each segment can be filled with either air or Si. To optimize the metagrating, the authors used a global optimization method named GLOnet. The GLOnet is based on both a generative neural network (GNN) and topology optimization as shown in the bottom panel of Fig. 8(b). The GNN takes the desired deflection angle θ and the working wavelength λ together with a random noise vector z as inputs. The inputs pass through FCLs and layers of deconvolutional blocks, and then a metagrating design is generated. The Gaussian filter at the last layer of the generator eliminates small features that are hard to fabricate. Next, the topology optimization is applied. By performing both a forward simulation and an adjoint simulation, the gradient of the objective function (efficiency) is calculated. The weights of the ANNs are updated according to the gradient ascent method. To make the model capable of working for any deflection angle and wavelength, the initialization of the model is essential to span the full design space. Therefore, an identity shortcut is added to map the random noise directly to the output design, which will enable all kinds of designs when the initial weight of the GNN is small. It should be noted that the GLOnet is different from conventional topology optimization. In conventional topology optimization, the structural parameters (like the refractive index of individual segments) are updated for a single device with a fixed deflection angle and wavelength. When the goal (deflection angle θ) or the working wavelength is changed, the optimization needs to be performed again for the new device. However, in the GLOnet, the optimized parameters are the weights in the neural networks during each iteration. Therefore, the GNN is improved in terms of the ability to inversely design devices for varied goals and working wavelengths, without the need to retrain the model when the target changes. The performances of conventional topology optimization and the GLOnet optimization have been compared in this work: 92% of the devices designed by the GLOnet have efficiencies higher than or within 5% of the devices designed by the other method. In addition, the retrieved devices gradually converge to a high-efficiency region as the iteration number of the training process increases.

Combining topology optimization and ANNs, Z. A. Kudyshev et al. studied the structure optimization of high-efficiency thermophotovoltaic (TPV) cells operating in the desired wavelength range (λ=0.51.7  μm) [42]. The design is based on a gap plasmonic structure. As shown in the top panel of Fig. 8(c), the optimization can be divided into three main steps. First, the topology optimization method is applied to generate a group of appropriate structures for training. Then an adversarial autoencoder (AAE) network is trained. Similar to the VAE, the AAE consists of an encoder to map the input designs to a latent space and a decoder to retrieve the structure from the latent vector sampled from the latent space. Both the VAE and AAE models try to make the latent distribution q(z˜) approach a predefined distribution p(z) (a 15-dimensional Gaussian distribution in Ref. [42]). In the VAE model, a Kullback–Leibler divergence that compares q(z˜) with p(z) is defined as one part of the loss function; while in the AAE, a discriminator used to distinguish the samples from q(z˜) and p(z) is built, and the encoder is trained to generate samples that can fool the discriminator. In the last step, the structure retrieved from the decoder is refined with topology optimization to remove the blurring of the generated designs. As a result, the hybrid method that combines AAE and topology optimization shows great performance, providing a mean efficiency of 90% for the retrieved structures. In contrast, the efficiency is 82% via direct topology optimization. The comparison between these two methods is shown at the bottom of Fig. 8(c) together with the emissivity and emission plots for the best designs from either method. In a very recent work [105], the same group further developed a global optimization method in which a global optimization engine can generate latent vectors and Visual Geometry Groupnet can rapidly assess the performance of the design.

Conventional machine learning methods, such as Bayesian learning [106], clustering [107], and manifold learning [104], are also very helpful in solving photonic design problems. In 2019, L. Li et al. showcased a machine-learning-based imager that can efficiently record the microwave image of a moving object by a reprogrammable metasurface [104]. This work may pave the way for intelligent surveillance with both fast response time and high accuracy. The meta-atom has three metallic patches connected via PIN diodes to encode 2-bit information as schematically shown in the top panel of Fig. 8(d). The digital phase step is around 90° between adjacent states, and the state can be tuned by applying an external bias voltage. The authors recorded a moving person for less than 20 min to generate the training data for the model. With principal component analysis (or random projection), the main modes with significant contributions were calculated. Then all meta-atoms were tuned by a bias voltage to match the principal component analysis modes for each measurement. In this way, the measurement became more efficient because it always captured the information with a high contribution to reconstructing the microwave image. To test the well-trained model, another person was moving in front of the metasurface, and images of the movements were reconstructed as shown at the bottom of Fig. 8(d). With only 400 measurements, which were far fewer than the number of pixels, high-quality images could be produced even when the person was blocked by a 3-cm-thick paper wall. This method was further extended to the classification problem, in which the authors defined three different movements (i.e., standing, bending, and raising arms). With a simple nearest-neighbor algorithm, only 25 measurements led to good recognition of the movements.

5. CONCLUSION AND OUTLOOK

In this review, we have introduced the basic idea of applying ANNs and other advanced algorithms to accelerate and optimize photonic designs, including plasmonic nanostructures and metamaterials. We have highlighted some representative works in this field and discussed the performance and applications of the proposed models. In the inverse design problem, the neural network is usually built upon FCLs and CNNs, integrated with other neural network units like ResNets and RNNs. It is beneficial to incorporate ANNs with conventional optimization methods such as genetic algorithm and topology optimization because the conventional optimization methods can help to perform global optimization and provide feedback to further improve the ANNs. The emergence of all the methods offers a great opportunity to increase the structural complexity in the devices, which can realize much more complex and novel functionalities.

The development of photonics can also potentially benefit the studies of computational methods. For instance, it has been long sought to push the computation speed to the speed of the light. All-optical neuromorphic computing [108112] via optical networks is one approach toward this goal. In principle, the diffraction nature of light described as exp(ik·r) can also be regarded as a nonlinear function. Therefore, the intensity profiles in two diffractive layers “connected” with light diffraction can be a good analogy to the connection between neurons in ANNs. Based on this idea, researchers have demonstrated a new kind of neural network built upon all-optical components, which are known as optical neural networks (ONNs) [113117]. As a comprehensive example, X. Lin et al. reported an all-optical system that can serve as a diffractive deep neural network (D2NN) for image classification in 2018 [118]. The system is composed of several layers of 3D printed structures. According to the Huygens–Fresnel principle, points in the D2NN layers can be regarded as a secondary source of light. Therefore, each point in the front layer will contribute to the amplitude and phase distribution of each point in the following layers, while the propagation phase will function as the nonlinearity. The analogy between the D2NN and the ANN is illustrated in the top panel of Fig. 9(a). The authors designed the D2NN using the same error backpropagation method as in the ANNs and adjusted the phase distribution in each layer. This design process was run on the computer, but once the design finished, the fabricated device can perform prediction (classification) all-optically. In the measurement, the light passed through an input plane with the same shape as the image. By detecting the position with the maximum output intensity after light passing through all layers, the class of the input image can be read out. The authors trained and tested the classifier with images of handwritten digits and fashion products. The experimental results show great accordance with the expectation, as shown in the bottom panel of Fig. 9(a), with an accuracy of 91.75% and 86.60% for the two tasks, respectively. Two years later, C. Qian et al. showed optical logic operations by a diffractive neural network [119]. The goal was to perform the logic operations such as “and,” “or,” and “not” for the inputs. As shown in the first two rows of Fig. 9(b), the input wave was shaped so that it can only pass through certain regions before illuminating on the diffractive metasurface. In this way, the two binary inputs and the logical operation can be controlled. The results can also be read out by detecting the intensity at two positions representing “0” and “1.” The last two rows of Fig. 9(b) show the experimental measurement for 10 different operations, and all the profiles indicate the correct results. More efforts are needed to further advance this exciting direction, for instance, by reducing the footprint and increasing the efficiency of the optical neural networks.

(a) Top: Comparison between the all-optical D2NN and a conventional ANN. Bottom: Measured performance of the classifier for handwritten digits and fashion products. (b) Top: Sketch of the optical logic operations by a diffractive neural network. Bottom: Experiment setup and measured results of three basic logic operations on the fabricated metasurface. (a) is reproduced from Ref. [118] with permission; (b) is reproduced from Ref. [119] with permission.

(a) Top: Comparison between the all-optical D2NN and a conventional ANN. Bottom: Measured performance of the classifier for handwritten digits and fashion products. (b) Top: Sketch of the optical logic operations by a diffractive neural network. Bottom: Experiment setup and measured results of three basic logic operations on the fabricated metasurface. (a) is reproduced from Ref. [118] with permission; (b) is reproduced from Ref. [119] with permission.

The ANNs are typically considered a “black box” since the relationship between inputs and outputs learned by the ANNs is usually implicit. In some published works, researchers can visualize the output of each individual layer to provide some information on what feature is learned (or what function is done) by each layer [40], which is a good attempt. However, if we can further extract the relation explicitly from the well-trained ANNs, it will be very helpful to find new structure groups that lie out of the conventional geometry groups (like H-shape, C-shape, bowtie). At the same time, it will also provide guidelines or insights for the design of optical devices. Another important direction is to extend the generality of the ANNs models. When applying ANNs to solve the traditional tasks, such as image recognition and natural language processing, we want the neural networks to learn the information and distribution that lie inside the natural images or languages themselves and try to reconstruct or approximate these distributions. The ANNs have been proven to work well in learning and summarizing the distributions from the images or languages. At the same time, it is relatively easy to extend the model to deal with other kinds of images or languages. However, the inverse design tasks in photonics are more complicated. The reason is that the ANNs need to learn the implicit physical rules (such as Maxwell’s equations) between the structures and their optical responses, instead of the information and distribution associated with the structures themselves. Therefore, extending the capability of a well-trained neural network in the inverse design problems remains a challenge. Most of the ANNs described in this review paper are only specified for a certain design platform or application. It is true that a model can be fine-tuned to handle different tasks, but the model needs to be retrained and, at the same time, an additional training data set is required. When the original training set contains all kinds of training data for multiple tasks, multiple design rules are likely to be involved and learned by the ANNs. The performance of the model will not be satisfactory for each individual task compared to the model trained with only a specific data set for this task, because the rules for other tasks will serve as perturbation or noise in this case. It is very important to find the trade-off.

Over the past decades, photonics and artificial intelligence have been evolving largely as two separate research disciplines. The intersection and combination of these two topics in recent years have brought exciting achievements. On one hand, the innovative ANN models provide a powerful tool to accelerate the optical design and implementation process. Some nonintuitive structures and phenomena have been discovered by this new strategy. On the other hand, the developed optical designs are expected to produce a variety of real-world applications, such as optical imaging, holography, communications, and information encryption, with high efficiency, fidelity, and robustness. Toward this goal, we need to include the practical fabrication constraints and underlying material properties into the design space in order to globally optimize the devices and systems. We believe that the field of interfacing photonics and artificial intelligence will significantly move forward as more researchers from different backgrounds join this effort.

References

Yihao Xu, Xianzhe Zhang, Yun Fu, Yongmin Liu. Interfacing photonics with artificial intelligence: an innovative design strategy for photonic structures and devices based on artificial neural networks[J]. Photonics Research, 2021, 9(4): 0400B135.

引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!