Deep learning assisted variational Hilbert quantitative phase imaging
1 Introduction
Quantitative phase imaging (QPI), as a powerful labelfree imaging technique, enables dynamic 2D and 3D nondestructive imaging of completely transparent structures^{13}. It uses the refractive index as an endogenous contrast agent to generate subcellularspecific quantitative maps of analyzed live biostructure^{4, 5}. QPI solutions based on digital holographic microscopy (DHM) encode a complex wavefront information into intensity modulations by the interference of a scattered sample wave and a reference wave^{69}. And it can robustly perform the quantitative analysis of wavematter interactions by decoding phase delay from a hologram. DHM has emerged as a valuable means in the biomedical fields, such as measurements for stainfree biological cells^{3, 10} , optical metrology of nanostructures^{1114}, and drug release monitoring invitro^{15}.
Regarding the phase demodulation strategy employed, there are two main configurations for holographic wavefront acquisition in DHM, i.e., inline and offaxis digital holography (DH). Inline DH records complete wavefront information by the interference of the object light and the reference light on the same optical axis, which can realize full detectorbandwidth phase reconstruction. However, due to the superimposed twin image, the phase retrieval results of samples are severely impacted by imaging artifacts. It always needs to be processed via iterative phase retrieval^{16, 17} or noniterative phaseshifting methods^{1820}, which dramatically sacrifices the temporal resolution. Therefore, it is difficult for the inline DH, which is vulnerable to external disturbance and vibration, to be applied to dynamic measurement. Alternatively, offaxis DH implements twinimage separation by introducing a slight angle between the object beam and reference beam and recovers the complex wavefront of the sample from the singleframe offaxis hologram. Whereas, for achieving the separation of autocorrelation and crosscorrelation terms in the spatial frequency domain (SFD), the offaxis DH needs to provide a sufficiently high carrier frequency at the expense of the spacebandwidth product (SBP) of the imaging system^{21}. The slightly offaxis DH regime, as a singleframe highSBP DH imaging solutions, is therefore proposed^{2224}. It optimizes SBP through full spectral separation of conjugated object lobes while leaving the autocorrelation term partially overlapped with informationcarrying crosscorrelation terms. Under this configuration, the inevitable spectrum overlapping causes phase artifacts, which greatly degrades the imaging quality and impairs the practicality of the slightly offaxis DH configuration.
Highaccuracy artifactsfree phase recovery from the lowcarrier frequency holograms is the key to slightly offaxis DH application. This process is presently implemented by suppressing autocorrelation term iteratively^{25}, utilizing dualframe decoding scheme^{26, 27}, employing second wavelength assistance^{28} and performing the 1D limited processing^{29, 30}. With inspiration from the theory of “cepstrum” and homomorphic filtering^{31}, a slightly offaxis DH demodulation scheme based on the KramersKronig (KK) relations is proposed, which utilizes the halfspace bandwidth of the sensor to achieve highSBP imaging^{32, 33}. Although it is able to increase the SBP of full complex field recovery significantly, it inevitably requires intensity restrictions on the object and reference beams and the separation of the crosscorrelation terms of the interferogram in the extended SFD. Noteworthily, an exquisite lowcarrier frequency fringe demodulation approach has been presented recently, namely variational Hilbert quantitative phase imaging (VHQPI)^{34}. The VHQPI, as an endtoend pure numerical addon module, deploys the merger of tailored variational image decomposition^{35} and enhanced Hilbert spiral transform^{36} to achieve quantitative phase recovery. It adaptively alleviates the overlappedspectrum problem and robustly demodulates highquality phase information, performing excellent practicality in biological applications.
Although VHQPI has demonstrated excellent lowcarrier frequency fringe demodulation capability, the algorithminherent limitations (e.g., parameter robustness and iterative stability) still cause nonsufficient image frequency component extraction, resulting in imaging artifacts in the phase reconstruction results. Deep learning (DL), as a subfield of machine learning, has currently gained extensive attention in the field of optical metrology and demonstrated great potential in solving optical metrology tasks^{3746}. When sufficient training data is collected in an environment that reproduces real experimental conditions, the trained model may have advantages over physicsmodelbased approaches on some issues (e.g., computing speed, parameter adaptivity, algorithm complexity)^{37}. Specifically, in terms of a series of illposed inverse phase retrieval problems, the traditional physical model tends to exhibit higher physics complexity and time consumption. Driven by a large dataset, the deep neural network (DNN) can directly and efficiently reconstruct the phase and amplitude images of the objects from the captured holograms^{4749}. Nevertheless, in DLbased phase recovery tasks, it is pretty tricky and laborious to capture massive datasets and generate the corresponding ground truth, especially when applied to biosamples. Deep image prior (DIP) applies an untrained network to the solution of several inverse problems without a massive training dataset and ground truth, which can fit a randomly initialized DNN to a single corrupted image^{50}. Inspired by the DIP, an untrained network model named “PhysenNet” is proposed, which incorporates a complete physical model into the conventional DNN to achieve phase retrieval from a single intensity image^{51}.
Inspired by the successful application of the interplay between DNN and the physical model, in this work, we propose a DLassisted variational Hilbert quantitative phase imaging approach (DLVHQPI). Unlike the massivedatadriven DL training model, DLVHQPI, which utilizes DNN to compensate and optimize the possible solutions of the physicsdriven model, can achieve highprecision artifactsfree phase recovery using only a small fraction datasets. Specifically, VHQPI, as the underlying physical model, can complete the preliminary extraction of the background components of the fringes to provide a physical prior for the deep learning model. The DNN compensates for the image frequencies that cannot be extracted by the physical model using the idea of residual compensation. Due to the physical model reducing the information entropy of the dataset, the DLVHQPI performs higher reconstruction accuracy utilizing less than onetenth of the dataset of the conventional endtoend model (without the physical model). The simulation experiments quantitatively demonstrate that the proposed method can achieve highaccuracy artifactsfree quantitative phase imaging from singleframe lowcarrier frequency holograms. And the results of livecell experiments demonstrate the practicality of the method in biological research.
2 Principle of VHQPI
The VHQPI, as the physical model of the DLVHQPI, adaptively and effectively completes the lowcarrier frequency fringe demodulation employing the unsupervised variational image decomposition (uVID) and enhanced Hilbert spiral transform (HST). This section will focus on describing the process details and physical limitations of this method. In the DH wavefront recording, the interferogram containing the required object information is constructed upon the coherent superimposition of the object and reference beams. The intensity distribution of the recorded hologram can be expressed as:
It consists of a sum of three fundamental intensity components: background (
Fig. 1. Flow chart of slightly offaxis interferometric fringe demodulation based on VHQPI.
To recover the phase information of the object, the uVIDfiltered noisefree zeromeanvalued interferogram is then analyzed using the HST algorithm^{36}, as shown in Step 2 of
where,
where
3 Deep learning assisted VHQPI model
VHQPI has been proven to have excellent robustness and practicality in lowcarrier frequency fringe demodulation issue though^{34}. However, the algorithminherent iterative instability and parameter robustness restrict the image frequency component extraction capability, which will cause the nonperfect background term removal. DL methods driven by massive datasets provide a new route to address this problem by virtue of their highpowerful image feature extraction characteristics. Whereas, when encountering insufficient training data, which is very common, the DL method based on massive datasets may have a poor effect. A feasible scheme is to train the DNN on a strongerconstrained available standardized dataset^{57}. Here, we employ Shannon entropy theory of the images in the dataset for that purpose: the lower the entropy of the datasets is, the more constrained prior information is, giving it a better samedomain generalization ability^{58, 59}. Therefore, in the proposed DLassisted VHQPI model, the uVID is utilized to extract the image background term as the physical prior of the network to reduce the dataset's entropy. The first convolutional neural network (CNN1) is used to “learn” the residual terms and assists the physical model to complete the preliminary estimation of the background components of the fringes. Furthermore, to further improve the imaging accuracy, the original hologram and the preliminary estimation background are refed into the model (CNN2) for advanced component extraction. Dualchannel input is used because the preliminarily estimated background terms have been very close to the ground truth after the first residual compensation by CNN1. Hence, the preliminary estimated background can be used to provide the network with feature guidance and helps CNN2 achieve the advanced component extraction.
As depicted in
Fig. 2. Deep learningassisted VHQPI. (a ) Total network structure, combining uVID and HST with CNN respectively for phase reconstruction. (b ) CNN1 takes a hologram as input and consists of three convolutional layers and a group of residual blocks to achieve compensation of background residuals by learning. (c ) The CNN2 network structure is the same as CNN1, except that CNN2 combines the original hologram and the result of the first process into a twochannel input for advanced background compensation.
Moreover, both CNN1 and CNN2 networks are composed of a convolutional layer (Conv), a group of residual blocks (containing four residual blocks), and two convolutional layers. Each residual block comprises two sets of Convs stacked one above the other. The network architecture uses Batch Normalization^{60} and ReLU activation^{61} to accelerate the model convergence. It establishes a shortcut between input and output, which can solve the problem of accuracy decline as the network deepens, thereby easing the training process. The output of the Convs is a 3D tensor of shape
where
4 Experiments and results
In this section, we demonstrate the performance of the proposed DLVHQPI method over the conventional physicsdriven lowcarrier frequency fringe demodulation techniques and pure DL approach without a physical model (DLnoPhy) through numerical simulation and livecell experiment. A rich set of paired training data is the prerequisite for network generalization during DL training. It is challenging to acquire a reliable ground truth in the realworld DH system due to environmentinduced instability and systeminherent speckle noise. Consequently, we simulated lowcarrier frequency holograms and the corresponding ground truth for training and quantitative analysis. We separately constructed the complex amplitude distributions of the object and reference light waves, and then the holograms can be constructed by solving the square of the modulus of the sum of the two. The sum of the squares of the modulus values of the two was calculated to obtain the background (ground truth) needed for training. The more specific process can be found in Supplementary information Section 1.
In the livecell experiment, we used the Digital holographic smart computational light microscope (DHSCLM) developed by SCILab, and turned it to a slightly offaxis state for hologram acquisition^{1}. In the DHSCLM, the object wave transmitting the objective lens (UPLanSAPO ×20/0.45NA, Olympus, Japan) interferes with the reference light and is recorded by the camera (The Imaging Source DMK 23U274, 1600×1200, 4.4 μm). The central wavelength of the illumination is 532 nm. The used sample is Henrietta Lacks (HeLa) human cervical cancer cells cultured in DMEM medium with 10% fetal bovine serum under standard cell culture conditions (37.2 °C in 5% CO_{2} in a humidified incubator). To acquire the ground truth from the configuration, each intensity map of the object and reference light paths needs to be captured separately under the highly stable condition of the holographic system (Refer to Section 2 of the Supplementary Information for detailed processing). The complete training process was implemented using the TensorFlow framework (Google) and was computed on a GTX Titan graphics card (NVIDIA). A fixed learning rate of 0.0001 for the experiment is adopted for the Adam optimizer^{62}.
4.3 Simulation
Fig. 3. The experiment results under the numerical simulation. (a ) The FT method phase recovery result. (b ) The phase recovery result of VHQPI. (c ) The phase result reconstructed by DLVHQPI. (d ) The ground truth. (e –g ) The difference between the phase results of the three methods (i.e. FT, VHQPI, DLVHQPI) and the ground truth. (h ) Quantitative error analysis of three methods. (i ) The crosssection of the phase results of FT, VHQPI, DLVHQPI, and ground truth, and (j1 –j4 ) are the DIC views of the partially enlarged views of their corresponding phase maps respectively.
In addition, we also designed a comparison experiment with DLnoPhy (The specific network is provided in the Section 4 of Supplementary information) to demonstrate the highefficiency and highaccuracy characteristics exhibited by the proposed method.
Table 1. The quantitative comparison results of DLVHQPI and DLnoPhy.

4.4 Livecell experiment on HeLa cells
We performed holographic biological experiments on HeLa cells under a ×20/0.45NA lens to demonstrate the application of the method in biological research. The denoised interferogram presented in
Fig. 4. Results of holographic experiments on HeLa cells. (a ) Lowcarrierfrequency highcontrast hologram collected by slightly offaxis interferometry system. (b ) Corresponding spatial frequency spectrum. (c ) The result of phase recovery by slightly offaxis holography using FT method under ×20 lens. (d ) The result of phase recovery using DLVHQPI. (e1 –e4 ) and (f1 –f4 ) correspond to the local amplification results of “Area1” and “Area2” for the two samples under different phase recovery methods. Where (e2, e4, f2, f4) are the corresponding DIC views, respectively. (g ) and (h ) The DIC views after partial magnification of the phase map in the corresponding red box. (i ) The numerical distribution of the crosssection and detailpreservation feature of the DLVHQPI.
Indeed, reducing the size of the FT filter window may also be a good way to alleviate artifacts, but this will not fundamentally address the problem of the overlapped spectrum and will cause phase imaging blur. The reason is that reducing the filtering window is at the expense of the system’s SBP and the highfrequency information of the object cannot be enclosed in the limited filtering window. In the Section 3 of Supplementary information, we experimentally present the imaging effects under different FT filtering windows for living cells. To verify the generalization of DLVHQPI, we supplemented a new group of experimental results for living cells in Supplementary Section 5, in which we added a comparison and discussion with the VHQPI method and the traditional FT method. The results demonstrate that DLVHQPI still performs the best artifactsuppression ability and generalizability under a new group of biological applications.
5 Conclusions and discussions
In summary, we proposed a highaccuracy artifactsfree singleframe lowcarrier frequency fringe demodulation approach for the slightly offaxis DH system, i.e., a model using the DNNassisted physical process. When the crosscorrelation and autocorrelation are inevitably aliased in the SFD, the phase reconstruction based on the conventional FT method cannot eliminate the effect of phase artifacts caused by zeroorder term^{6}. Although reducing the size of the FT filter window may alleviate the problem of imaging artifacts, the highfrequency information loss of the object caused by the limited filtering window will cause imaging blur. The method based on KramersKronig relation is proposed on the basis of the concept of “cepstrum” and homomorphic filtering^{31}, however, this method must depend on the limited condition of the objectreference ratio and need the separation of the highorder terms in the extended SFD^{32, 33}. Furthermore, the VHQPI implements the background component removal of singleframe hologram utilizing the principle of image frequency components extraction, while it inevitably suffers from the nonsufficient background term removal caused by the physical method^{34}. In contrast, DLVHQPI, a novel DLassisted physical model method, can better suppress phase artifacts while improving imaging accuracy. The simulation result quantitatively demonstrates that the phase recovery accuracy obtained by DLVHQPI is greatly superior to that by FT and VHQPI. Moreover, the livecell experiment results demonstrate that our method is applicable in biological research.
In addition, it is noteworthy that in the classical endtoend DNN model (without a physical model), massive data pairs are required to train the network model for a higher reconstruction precision. However, it may be prohibitively laborious and timeconsuming for the realworld DH system to collect datasets and generate the corresponding ground truth. Conversely, the proposed DLVHQPI can perform better samedomain generalization ability and image datafeature extraction capability without a large of datasets. Compared to the classical endtoend DNN model (i.e., DLnoPhy), DLVHQPI can achieve a higher reconstruction accuracy utilizing only a small fraction of the datasets due to the physical model reducing the information entropy of DL training objects. Meanwhile, fewer datasets mean shorter training time and higher training efficiency.
The significance of our work lies in the multiple possibilities of applying the proposed DLassisted physical model idea to the QPI. This idea can be applied to many scenarios in which deep learning methods are applied to the QPI, e.g., addressing a series of illposed inverse phase retrieval problems and holographybased highthroughput optical diffraction tomography (ODT) problems^{6365}. Specifically, the artifactsfree lowcarrierfrequency fringe demodulation capability of the proposed method has application possibilities for ODT imaging of widebandwidth objects. In addition, it has also implications for highthroughput studies of highrobust commonpath offaxis interferometer systems^{66, 67}. We envision that the idea presented in this research can be applicable to a diverse range of future computational imaging techniques, not just limited to what we discussed here.
6 Acknowledgements
We are grateful for financial supports from the National Natural Science Foundation of China (61905115, 62105151, 62175109, U21B2033, 62227818), Leading Technology of Jiangsu Basic Research Plan (BK20192003), Youth Foundation of Jiangsu Province (BK20190445, BK20210338), Biomedical Competition Foundation of Jiangsu Province (BE2022847), Key National Industrial Technology Cooperation Foundation of Jiangsu Province (BZ2022039), Fundamental Research Funds for the Central Universities (30920032101), and Open Research Fund of Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense (JSGP202105, JSGP202201), National Science Center, Poland (2020/37/B/ST7/03629). The authors thank F. Sun for her contribution to this paper in terms of language expression and grammatical correction.
The authors declare no competing financial interests.
Supplementary information for this paper is available.
[1] FanY, LiJJ, LuLP, SunJS, HuYet alSmart computational light microscopes (SCLMs) of smart computational imaging laboratory (SCILab)PhotoniX202121910.1186/s43074021000402
[20] PoonTCDigitalHolographyandThreeDimensionalDisplay: PrinciplesandApplications (Springer, New York, 2006).
[59] CoverTMElementsofInformationTheory. John Wiley & Sons, 1999).
[60] IoffeS, SzegedyC.Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (JMLR.org, 2015).
[61] NairV, HintonGE.Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning 807–814 (Omnipress, 2010).
[63] ChoiW, FangYenC, OhS, LueN, DasariRRet alTomographic phase microscopy: quantitative 3Dmapping of refractive index in live cellsImaging Microsc2008104850
Article Outline
Zhuoshi Li, Jiasong Sun, Yao Fan, Yanbo Jin, Qian Shen, Maciej Trusiak, Maria Cywińska, Peng Gao, Qian Chen, Chao Zuo. Deep learning assisted variational Hilbert quantitative phase imaging[J]. OptoElectronic Science, 2023, 2(4): 220023.