Class-specific differential detection in diffractive optical neural networks improves inference accuracy Download: 1254次
1 Introduction
Machine learning, and in particular deep learning, has drastically impacted the area of information and data processing in recent years.1
The task of object recognition and classification is an important application area of machine learning. It is conventionally realized in two main steps. First, a lens-based imaging system followed by a CMOS/CCD array captures a scene at hand. The digitized and stored image of the scene is then fed into an all-electronic artificial neural network (ANN) pretrained for the task. The sampling density, and thus the number of detectors on the optoelectronic sensor plane, are dictated by the desired spatial and/or temporal resolution of the designed system.28 In a classification system, high spatial resolution is generally desired due to the vital importance of spatial features for the performance of ANNs, forcing the pixel count and density of the sensor arrays to be relatively high, which, consequently, increases the requirements on the size of the memory as well as the computational power, inevitably hampering the effective frame-rate. The compressive sensing/sampling field has broadly aimed to overcome some of these resource inefficiencies in conventional optical systems. However, computationally demanding recovery algorithms associated with compressive sensing frameworks partially hinder its application for a wide range of areas in need of real-time operation.
In earlier work, we introduced diffractive deep neural networks,25,27 which are composed of successive diffractive optical layers (transmissive and/or reflective), trained and designed using deep learning methods in a computer, and then physically fabricated to all-optically perform statistical inference based on the trained task at hand. In this framework, complex wave field of a given scene or object, illuminated by a coherent light source, propagates through the diffractive layers, which collectively modulate the propagating light such that the intensity at the output plane of the diffractive network is distributed in a desired way; i.e., based on the specific classification or imaging task of interest, these diffractive layers jointly determine the output plane intensity in response to an input. The applications of this concept for the design of optical imaging systems, as well as all-optical object classification, were experimentally realized.25
Unlike traditional, imaging-based machine vision systems, a diffractive optical neural network trained for a classification task needs only a few optoelectronic detectors, as many as the number of individual classes in a given dataset. Following their design and fabrication, diffractive optical neural networks execute classification with passive optical components, without the need for any power except the illumination beam and a simple max operation circuitry at the backend. Unless optical nonlinearities are utilized, diffractive optical neural networks are linear in nature, except the final optoelectronic detector plane; despite its linearity, additional diffractive layers have been shown to improve the generalization and inference performance of the network, indicating the depth advantage that comes with the increasing number of diffractive neural layers in the optical network.25,27 With a single photodetector assigned to each individual class of objects, Ref. 27 demonstrated a blind testing accuracy of 97.18% for all-optical classification of handwritten digits (MNIST database, where each digit was encoded in the amplitude channel of the input) and achieved 89.13% for all-optical classification of fashion-products (Fashion-MNIST database, where each object was encoded in the phase channel of the input).
In spite of the promising performance of the earlier work on diffractive optical networks, these architectures suffer from a well-known limitation in optics: the optoelectronic detectors are only sensitive to the incident optical power rather than the complex optical field, which limits the range of realizable values to nonnegative real numbers. In this work, this nonnegativity of the detected signal at the output plane of diffractive neural networks is mitigated through a differential detection scheme, which employs two optoelectronic detectors per data class at the output plane [see
Fig. 1. Illustration of different diffractive neural network design strategies. (a) Standard design refers to , where is the number of classes in the target dataset, which in this specific design is also equal to the number of detectors per diffractive neural network, is the number of diffractive layers per optical network, and refers to the number of neurons per diffractive layer. In the examples shown in this figure, , , meaning 0.2 million neurons in total. (b) Differential design shown on the left refers to , whereas the one on the right refers to as it uses two different jointly optimized diffractive networks, separating the positive and the negative detectors by placing them at different output planes without optical coupling between the two. (c) Class-specific design shown here refers to , where is the number of class subsets (in this example, case is shown). (d) Class-specific differential design shown here refers to where is illustrated. In general, there can be another version of a class-specific differential design where each diffractive neural network has only positive or negative detectors at the corresponding output plane; this special case is denoted with , where refers to the number of jointly designed diffractive neural networks. case, i.e., is included as part of (b) right panel, and we do not consider it under the class-specific neural network design since there is no class separation at the output/detector planes.
Fig. 2. Operation principles of a differential diffractive optical neural network. (a) Setup of the differential design, . In the example shown in this figure, , , . (b) A correctly classified test object from the MNIST dataset is shown. Subparts of (b) illustrate the following: (i) target object placed at the input plane and illuminated by a uniform plane wave, (ii) normalized intensity distribution observed at the output plane of the diffractive optical neural network, (iii) normalized optical signal detected by the positive (red) and the negative (blue) detectors, (iv) differential class scores computed according to Eq. (1) using the values in (iii). (c) and (d) are the same as in (b), except for Fashion-MNIST and CIFAR-10 datasets, respectively. Note that while the input object in (b) is modeled as an amplitude-encoded object, the gray levels shown in (c) and (d) represent phase-encoded perfectly transparent input objects. Since diffractive optical neural networks operate using coherent illumination, phase and/or amplitude channels of the input plane can be used to represent information.
Table 1. Blind testing classification accuracies of nondifferential (top row) and differential diffractive optical networks, without any class specificity or division. classes exist for each dataset: MNIST, Fashion MNIST, and gray-scaled CIFAR-10. For each data point, the training of the corresponding diffractive optical neural network model was independently repeated six times with random initial phase modulation variables and random batch sequences; therefore, each data point reflects the mean blind testing accuracy of these six trained networks, also showing the corresponding standard deviation.
|
In addition to the introduction of differential detection per class, in this work, we also made use of parallel computation capability of passive diffractive layers, and jointly optimized separate diffractive optical neural networks for positive and negative detectors (see e.g.,
Fig. 3. Operation principles of a diffractive optical neural network using differential detection scheme, where the positive and the negative detectors are split into two jointly optimized networks based on their sign. (a) Setup of the differential design, . In the example shown in this figure, , , . (b) A correctly classified test object from the MNIST dataset is shown. Subparts of (b) illustrate the following: (i) target object placed at the input plane and illuminated by a uniform plane wave, (ii) normalized intensity distribution observed at the output plane of the diffractive optical neural network, (iii) normalized optical signal detected by the positive (red) and the negative (blue) detectors, (iv) differential class scores computed according Eq. (1) using the values in (iii). (c) and (d) are the same as in (b), except for Fashion-MNIST and CIFAR-10 datasets, respectively. Note that while the input object in (b) is modeled as an amplitude-encoded object, the gray levels shown in (c) and (d) represent phase-encoded perfectly transparent input objects.
Fig. 4. Operation principles of a diffractive optical neural network using class-specific detection scheme, where the individual class detectors are split into separate networks based on their classes. Unlike Figs. 2 and 3 , there are no negative detectors in this design. (a) Setup of the class-specific design, . In the example shown in this figure, , , . (b) A correctly classified test object from the MNIST dataset is shown. Subparts of (b) illustrate the following: (i) target object placed at the input plane and illuminated by a uniform plane wave, (ii) normalized intensity distribution observed at the two output planes of the diffractive optical neural networks, (iii) normalized optical signal detected by the detectors. (c) and (d) are the same as in (b), except for Fashion-MNIST and CIFAR-10 datasets, respectively. Note that while the input object in (b) is modeled as an amplitude-encoded object, the gray levels shown in (c) and (d) represent phase-encoded perfectly transparent input objects.
Because of the passive nature of diffractive neural networks, at the cost of optical set-up alignment complexity as well as illumination power increase, one can create scalable, low-power, and competitive solutions to perform optical computation and machine learning through these jointly optimized diffractive neural network systems.
2 Results and Discussion
After the introduction of our notation to symbolize different diffractive neural systems (
When the optical path is divided into two as shown in
Table 2. Blind testing classification accuracies of different class division architectures combined with nondifferential and differential diffractive neural network designs. For each data point, the training of the corresponding diffractive optical neural network model was independently repeated six times with random initial phase modulation variables and random batch sequences; therefore, each data point reflects the mean blind testing accuracy of these six trained networks, also showing the corresponding standard deviation.
|
A direct comparison between the “differential” and “nondifferential rows of
Fig. 5. Performance comparison of different diffractive neural network systems as a function of , the number of class subsets. classes exist for each dataset: MNIST, Fashion MNIST, and grayscale CIFAR-10. Based on our notation, refers to a jointly optimized diffractive neural network system that specializes to each one of the classes separately. These results confirm that class-specific differential diffractive neural networks ( ) for outperform other counterpart diffractive neural network designs. For each data point, the training of the corresponding diffractive optical neural network model was repeated six times with random initial phase modulation variables and random batch sequences; therefore, each data point reflects the mean blind testing accuracy of these six trained networks, also showing the corresponding standard deviation.
Table 3. Comparison of blind testing accuracies of different types of neural networks, including optical, hybrid, and electronic.
|
So far, in our differential diffractive neural network designs, we considered balanced differential detection between the optical signals of [
Another method to benefit from the parallel computing capability of passive diffractive neural networks is to create independently optimized diffractive neural networks that optically project their diffracted light onto the same output/detector plane. Unlike the jointly optimized diffractive neural systems described earlier, here in this alternative design strategy we select a diffractive network design,
This design strategy of using independently optimized diffractive networks is in fact similar to ensemble methods32,33 that are frequently used in machine learning literature.
Fig. 6. The comparison between the classification accuracies of ensemble models formed by 1, 2, and 3 independently optimized diffractive neural networks that optically project their diffracted light onto the same output/detector plane. Blue and orange curves represent and designs, respectively. (a) MNIST, (b) Fashion-MNIST, and (c) grayscale CIFAR-10. Not to perturb the inference results of each diffractive network due to constructive/destructive interference of light, incoherent summation of the optical signals of each diffractive network at the common output plane is considered here, which can be achieved by adjusting the relative optical path length differences between the individual diffractive networks to be larger than the temporal coherence length of the illumination source.
After reporting the results of various different design strategies for diffractive neural networks, in
While the presented systematic advances in diffractive neural network designs have helped us achieve a competitive inference performance, with classification accuracies that are among the highest levels achieved so far for optical neural networks, there is still a considerable performance gap with respect to the state-of-the-art all-electronic deep learning models such as ResNet (see e.g.,
Finally, we would like to also emphasize that these reported advances in the inference and generalization performance of class-specific differential diffractive neural networks come at the cost of a requirement to increase the input illumination power. For example, to keep the signal-to-noise ratio (SNR) of each photodetector that is positioned at an output plane of a class-specific differential diffractive neural network system, (e.g.,
3 Methods
3.1 Physical Parameters of Diffractive Optical Neural Networks
The physical model of wave propagation, used in the forward model of diffractive neural networks, was formulated based on the Rayleigh–Sommerfeld diffraction equation and digitally implemented, using a computer, based on the angular spectrum method.25 According to this model, the neurons constituting the diffractive layers of an optical network can be interpreted as sources of modulated secondary waves.25 Assuming an illumination wavelength of
In our diffractive neural system and classifier designs, five fully connected diffractive layers [phase-only modulation with each layer having 40k (
3.2 Implementation of Differential Diffractive Optical Neural Networks
Our differential detection model, in the context of diffractive optical classification systems, defines the class scores based on normalized differences between the positive and the negative detector signals at the output plane(s). With a pair of detectors assigned per class (a positive and a negative detectors), the normalized difference for class
The differential measurement technique is implemented using two different design approaches. In the first model, the positive and negative detectors representing a class are placed on the same output plane after a diffractive neural network, i.e.,
Note that when
3.3 Class-Specific Diffractive Neural Networks
Division of elements of a target dataset into smaller sets based on their class labels was used to improve the inference performance of diffractive neural networks. In the training of class-specific diffractive neural networks, the target dataset was divided into subgroups of classes and these subgroups were split among parallel, simultaneously optimized diffractive neural networks. Although, these diffractive networks were trained simultaneously, the optical waves modulated by each network were assumed to be isolated from other diffractive networks of the same neural system,
3.4 Ensemble of Diffractive Optical Neural Networks
Bagging32 and ensemble33,36 methods are commonly used in machine learning literature to create multiclassifier systems that have superior performance compared to each individual unit constituting them. In these systems, the class scores coming from individual classifier units are merged into a single vector by means of arithmetic or geometric averaging or by using majority voting schemes. Similarly, we used independently optimized diffractive neural networks forming an ensemble and assumed that the diffracted optical signal from each optical network is superimposed with the diffracted light of the other networks on the same (i.e., common) output plane, containing the photodetectors. Assuming that the relative optical path length difference between any two diffractive networks of the ensemble is longer than the temporal coherence length of the illumination beam, the detectors at the output plane incoherently add up the light intensities generated by the independent diffractive networks. Apart from coherence engineering, an alternative option could be to sequentially measure the detector signals at the common output plane, one diffractive network at a given time, and digitally combine the class scores after the measurements. Both of these approaches (simultaneous incoherent summation of the projected light intensities at the common output plane versus sequential capture of each diffractive network’s output at the common detector plane and averaging of the class scores) achieved the same inference performance. To evaluate the performance of an ensemble of diffractive optical neural networks, we trained multiple replicas of a diffractive classifier design,
The training strategy of setting
3.5 Details of Model Training
Object classification performances of all the models presented in this paper were trained and tested on three widely used datasets: MNIST, Fashion-MNIST, and CIFAR-10. For MNIST and Fashion-MNIST datasets, 55,000 samples were used as training data while the remaining 15,000 objects were divided into two sets of 5000 and 10,000 for validation and testing, respectively. The CIFAR-10 dataset was partitioned into three sets of 45,000, 5000, and 10,000 samples, used for training, validation, and testing of our diffractive neural networks, respectively. Since the samples of CIFAR-10 dataset contain three color channels (red, green, and blue), they were converted to grayscale using the built-in rgb_to_grayscale function in TensorFlow to comply with the monochromatic (or quasimonochromatic) illumination used in our diffractive network models.
Softmax cross-entropy was used as the loss function for all the neural network models (optical or electronic) presented in this work. With
All the neural networks in this paper (optical or electronic) were simulated using Python (v3.6.5) and Google TensorFlow (v1.10.0) frameworks. An Adam optimizer was used37 during the training of all models. The parameters of the Adam optimizer were kept identical between each model and taken as the default values in the TensorFlow implementation. The learning rate was initially set as 0.001, but an exponential decay was applied in every eight epochs such that the new learning rate equals 0.7 times the previous one. All the models were trained for 50 epochs and the best model was selected based on the classification performance on the validation set. For each model, the training was independently repeated six times with random batch sequences and initial phase modulation variables. Throughout this paper, our blind testing accuracy for each diffractive neural network design reports the mean value over these six repetitions, applied to testing datasets. For the training of our models, we used a desktop computer with an NVIDIA GeForce GTX 1080 Ti graphical processing unit (GPU) and Intel Core (TM) i7-7700 CPU @3.60 GHz and 16 GB of RAM, running Microsoft Windows 10 operating system. The typical training time of the diffractive neural network shown in
[1] Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, 2015, 521: 436-444.
[5]
[10] D. Psaltis, et al.. Holography in artificial neural networks. Nature, 1990, 343: 325-330.
[18]
[32] L. Breiman. Bagging predictors. Mach. Learn., 1996, 24(2): 123-140.
[37]
Article Outline
Jingxi Li, Deniz Mengu, Yi Luo, Yair Rivenson, Aydogan Ozcan. Class-specific differential detection in diffractive optical neural networks improves inference accuracy[J]. Advanced Photonics, 2019, 1(4): 046001.