Optical computing accelerators: Principle, application, and perspective

Peng Zou , Fangchen Hu , Yiheng Zhao , Ziqiang He , Bo Xu , Haiwen Cai , Wei Chu

Front. Phys. ›› 2025, Vol. 20 ›› Issue (3) : 032302

PDF (7951KB)
Front. Phys. ›› 2025, Vol. 20 ›› Issue (3) : 032302 DOI: 10.15302/frontphys.2025.032302
TOPICAL REVIEW

Optical computing accelerators: Principle, application, and perspective

Author information +
History +
PDF (7951KB)

Abstract

The rapid rise of artificial intelligence (AI) has catalyzed advancements across various trades and professions. Developing large-scale AI models is now widely regarded as one of the most viable approaches to achieving general-purpose intelligent agents. This pressing demand has made the development of more advanced computing accelerators an enduring goal for the rapid realization of large-scale AI models. However, as transistor scaling approaches physical limits, traditional digital electronic accelerators based on the von Neumann architecture face significant bottlenecks in energy consumption and latency. Optical computing accelerators, leveraging the high bandwidth, low latency, low heat dissipation, and high parallelism of optical devices and transmission over waveguides or free space, offer promising potential to overcome these challenges. In this paper, inspired by the generic architectures of digital electronic accelerators, we conduct a bottom-up review of the principles and applications of optical computing accelerators based on the basic element of computing accelerators − the multiply-accumulate (MAC) unit. Then, we describe how to solve matrix multiplication by composing calculator arrays from different MAC units in diverse architectures, followed by a discussion on the two main applications where optical computing accelerators are reported to have advantages over electronic computing. Finally, the challenges of optical computing and our perspective on its future development are presented. Moreover, we also survey the current state of optical computing in the industry and provide insights into the future commercialization of optical computing.

Graphical abstract

Keywords

optical computing accelerators / artificial intelligence / scientific computing

Cite this article

Download citation ▾
Peng Zou, Fangchen Hu, Yiheng Zhao, Ziqiang He, Bo Xu, Haiwen Cai, Wei Chu. Optical computing accelerators: Principle, application, and perspective. Front. Phys., 2025, 20(3): 032302 DOI:10.15302/frontphys.2025.032302

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Artificial intelligence (AI) has rapidly advanced recently, permeating nearly every aspect of our life and revolutionarily rebuilding various fields, such as autonomous driving, medical diagnosis, and information processing [1-3]. As AI is stepping into the era of large-scale AI models, compute has become a key bottleneck in both the training of large-scale AI models and the deployment of those models in AI products, increasing the demand for hardware computing accelerators. Taking GPT-4 as an instance, the training compute used to train GPT-4 has reached 2.1 × 1025 FLOP, requiring 90–100 days using a cluster of around 25000 NVIDIA A100 graph processing units (GPUs), even though NVIDIA A100 has stand for the most powerful parallel computing accelerator in commercial [4].

At present, the mainstream hardware accelerators are electronic digital parallel accelerators based on von Neumann architecture, represented by GPUs. Despite that electronic digital parallel accelerators have extremely high versatility, they do not have advantages in energy consumption and latency in large-scale parallel computing [5]. As the semiconductor industry advances to process nodes below 3 nanometers, electronics also increasingly encounter inherent physical limitations, such as increased leakage current due to quantum tunneling and reduced carrier mobility from short-channel effects [6]. Therefore, it is increasingly difficult to obtain energy efficiency benefits through size reduction of transistors. Moreover, it should be noted that the circuit energy consumption based on transistors has a cubic relationship with the clock frequency, which means the latency of large-scale digital electrical circuits is also hardly reduced. In response, new paradigms of parallel computing accelerators are desired and actively explored by both industry and academia.

Among these paradigms, optical computing has emerged as a promising platform for parallel computing acceleration, benefitting from its large bandwidth available at optical frequencies and low propagation loss of nanophotonic waveguides. First, photonics have an ~100 000 ×larger bandwidth than electronics accounting for its massive frequency-multiplexing parallelism and very fast dynamics of photonics devices [7]. Despite that small analog and digital electronic circuits can have bandwidth more than 5 GHz [8, 9], the bandwidth of both analog and digital electronics for computing systems tends to be limited to speeds much smaller than 5 GHz by wire delays [10, 11]. Second, for large-scale computing arrays, the waveguide loss of on-chip photonic processors is much lower than that of on-chip electrical transmission over a relatively long distance. For example, the transmission loss of light from 1600 to 1640 nm in a silicon nitride waveguide is only 0.06 dB/cm, with almost zero joule heat [12]. However, electrical transmission loss generates joule heat and there is signal crosstalk. The required energy consumption increases with the length [13]. With these advantages, optical computing accelerators have the opportunity to solve the current energy consumption and latency issues of large-scale AI models. In addition, large models have been proven to be able to train and infer at 4 bits [14, 15]. This also allows photonic analog computing, which is not known for high precision, to amplify its advantages of low energy consumption and low latency while completing tasks.

To better promote the development of optical computing accelerators, we systematically review the principles, challenges and applications with advantages of optical computing, and present our perspectives on the outlook of optical computing in this paper. Different from most other reviews of optical computing that are classified according to the types of architectures, this article starts with introducing the basic functional unit of optical computing accelerators — the optical multiply-accumulate core, which can be scaled to most of varieties of current optical computing array architectures to perform functions such as vector-matrix multiplication or convolution. Finally, the array is expanded into a network to complete more complex acceleration tasks such as AI inference and scientific computing. Spanning from digital computing accelerators to analog optical computing accelerators, this paper is structured as follows. Section 2 begins with an overview of current electrically digital parallel accelerators. Section 3 focuses on the implementation of a basic optical multiply-accumulate core, and how the core is scaled to different array architectures realizing vector-matrix multiplication. Section 4 provides a comprehensive classification of the applications of optical computing accelerators, including AI inference, scientific computing, where the optical computing accelerators exhibits obvious advantages than electrical computing accelerators. Section 5 introduces the commercialization of optical computing accelerators. Sections 6 and 7 offer challenges and an outlook on optical computing accelerators, as well as a summarizing conclusion.

2 Principle of electrical computing accelerators

Modern parallel computing acceleration is usually implemented on the platforms of digital clock-based hardware accelerators, such as GPUs, application-specific integrated circuits (ASICs) or field programmable gate array (FPGA).

GPUs are highly reconfigurable processors that can perform a large number of parallel computations. Referred to NVIDIA GPU reports [16], GPUs can not only perform high-precision (up to 64-bits floating point format (FP64)) multiplications and additions, but also high-throughput half-precision matrix-multiply-accumulation (HMMA) operations. The A100 tensor cores can support 16 × 16 × 16 HMMA for one-warp at 16-bits floating point format (FP-16) data precision. To data, the computation performance of Nvidia H100 SXM5 [17], which is the most powerful GPU on sale, reaches 133.8 TFLOPs for 16-bits brain float point format (BF16) data precision. Fig.1(a) gives a simple schematic of Nvidia GPU’s architecture. To decrease the frequency of writing/reading global memory, GPUs use three-levels caches to reduce communication latency and energy consumption induced by data movement. The data to be calculated are stored in global memory, and then are transferred to L2 caches before distributed to multiple streaming multiprocessors (SMs) for parallel computing. Inside one SM, the loaded data from L2 caches are stored in L1 caches or shared memory (SMEM). LD/ST modules fetch data from the L1 or SMEM to multiple streaming prcessors (SPs) that are made of several Tensor cores and Cuda cores to finally compute in parallel. The tensor core can calculate HMMA in one clock, but the supported data precision is up to 32-bits tensor float point format (TF32). The Cuda core can support single multiply-accumulate (MAC) up to FP64 in one clock. The calculated result of one HMMA or MAC must be stored back to registers, and then be loaded again for next iteration in Cuda cores or Tensor cores. The SFU module is charge of the calculation of special functions. The specific definition of FP64, FP16, BF16, TF32 and other types of data formats for computing is explained in IEEE Standard for Floating-Point Arithmetic (IEEE 754) and Nvidia whitepapers [18, 19].

FPGAs are another type of programmable and reconfigurable integrated circuits built by random access memory (RAM) blocks and ample amount of logic gates [20]. Compared to GPUs working at the same clock rate, FPGAs exhibit less cost, less energy consumption and better computation latency in accordance to parallel acceleration, but worse computation performance. FPGAs can be programmed by hardware define language, which makes them able to adapt to new applications without fabricating a new chip. Fig.1(b) presents a simple schematic of a typical FPGA. The control units are charge of dispatching instructions to calculate data from input, registers and global memory in pipeline. The results can be stored in global memory or outputs by I/O interfaces.

ASICs are another custom integrated circuit designed and manufactured for specific user requirements, rather than any general purpose [21]. Hence, ASICs have the worst flexibility than GPUs and FPGAs, but powerful specific-computation performance. The most famous ASICs for AI acceleration are tensor processing units (TPUs), which are the recent development by Google, serving for large-scale AI inference and training. Featuring its 256 × 256 systolic-array architecture, the latest Google TPUv6 (Trillium) is reported to be of 925.9 TFLOPS compute performance at BF16 data precision [22]. Fig.1(c) gives a simple schematic of the architecture of Google TPU. The systolic array first loads data from global memory and then process matrix-multiply-accumulation (MMA) operation. Then, the output results are moved to the special functions module such as activation, normalization or pooling. Finally, the results are stored to global memory for next-ground MMA.

Despite that the digital electrical hardware accelerator has satisfying performance in computation operations per second due to nano-scale size of electronic components, they still face challenges on speed, energy consumption and latency when process large-scale MMA, also owning to the inherent characteristics of underlying electronic components. The electronic components are fundamentally limited in both speed and energy by Joule heating, electromagnetic crosstalk and capacitance that dominates the maximum operation clock of accelerators (Mostly less than 3 GHz for state-of-art GPUs) [23]. If taking ChatGPT-3 language model as an instance, the required computational intensity (computational performance per second/data moved from memory) of its transformer part at token prefill stage approaches 1536, but that of Nvidia A100 SXM is only 570 estimated by the memory bandwidth (2039 GB/s) and the computational performance per second (1248 TFLOPS for INT8), which cannot meet the requirement of ChatGPT-3 [24]. Moreover, the energy consumption issue is also alarming. Considering the current scale, flow and token generation of GPT 3.5, The power consumption of ChatGPT-3 training using GPU is 1287 MWh, which is enough to supply an average U.S. household for 120 years [25]. The daily energy consumption of meeting INT4 accurate inference tasks is approximately the total power of 30000 pieces GPU A100 acceleration cards. That is expected to exceed the energy consumption of China’s primary industry [26]. As a result, a new paradigm of parallel acceleration with high computation performance and low energy consumption is expected urgently.

3 Principle of optical computing accelerators

3.1 Principle of optical computing units

The optical signal is written as following:

E(t)=A(t) ej( ω0t+ϕ(t)) x^( t),

where A (t) is the amplitude at time t, ω0 is the angular frequency, ϕ(t) is the phase at time t, x^(t) is the polarization status at time t. The amplitude, phase, frequency, polarization and other states of light can be used to carry signals, thus providing more dimension for optical computing than electrical computing. In this chapter, we will briefly introduce the implementation method of optical computing from the underlying devices, detection principles to the implementation principles of multiplication and addition.

As shown in Fig.2, there are mainly four ways to realize signal modulation:

 1) Signal modulation by changing light intensity: The light intensity can be changed by adjusting the driving signal of laser, voltage controlled optical attenuator (VOA), saturable optical absorber (SOA). Therefore, signal modulation can be realized according to the relationship between the driving signal and the output light intensity.

 2) Signal modulation by changing light phase: Phase modulation is one of the most common modulation formats in optical computing. By adjusting the device temperature or the injected voltage/current, the linear modulation of the output light phase can be realized. The modulation function of the common electrical-optical modulators (EOM) such as micro-ring (MRR) or Mach-Zehnder modulator (MZM) are realized by phase modulation.

 3) Signal modulation by optical interference: By interfering two input optical signals through a multi-mode interferometer (MMI) and modulating the phase shifter, the amplitude and phase of the output optical signal can be adjusted.

 4) Signal modulation by diffractive elements: By adjusting the output amplitude or phase and the distance of the diffraction surfaces, the intensity and phase of the input light can be changed. The essential of diffractive elements is similar to the principle of optical amplitude and phase modulation. When the signal modulation is implemented by the diffractive elements, the phase or amplitude between the diffraction elements are mainly determined by the diffraction surface, and influenced by the distance between diffraction surfaces as well. The Diffraction surfaces are generally fabricated by 3D printed or nanolithography to achieve specific optical signal output. Besides, spatial light modulator (SLM) could be utilized as diffraction surfaces as well.

As shown in Fig.2, there are mainly four ways to realize optical signal detection:

 1) Single-end detector: The single-end detector extracts the power of the input optical signal and transfer the power to current. As the output current is small, a transimpedance amplifier (TIA) should be appended after the single-end detector to realize signal amplification and current signal to voltage signal conversion.

 2) Double-end detector: The principle of double-end detector is similar to that of single-end detector. Different from single-end detector, double-end detector can realize positive or negative current output by subtraction of the current signals from the PDs.

 3) Coherent detector: The above schemes can only extract the amplitude or power of the input optical signal. By interfering two input optical signals, the coherent detector can extract the phase and amplitude information of the input optical signals simultaneously.

 4) Array detectors: By arranging photodetectors into an array, multiple input signals can be detected simultaneously, thereby achieving high-speed detection of matrix-matrix multiplication and tensor multiplication. Charge-coupled device or camera are typical array photodetectors.

The multiplication units can be realized by combining the modulation units and the detection units. There are mainly three ways to realize multiplication units:

 1) Dot production by electro-optical modulator (EOM): This is the most common scheme for optical computing. In this scheme, the voltage/current driving signals are converted to optical signal by two EOMs. If single-end detector or double-end detector is utilized, the output signal is current, which is proportional to the input optical power. Therefore, it is necessary to realize the mapping between the input electrical signal and the optical domain signal power. Look-up table or nonlinear mapping circuit are necessary to achieve desired signal modulation. If coherent detection is utilized, the detected current is proportional to the amplitude or phase difference of two input optical signals. Therefore, dot production can be realized in amplitude dimension or phase dimension of the input optical signals.

 2) Diffraction multiplication: In the diffraction multiplication system, the phase and amplitude of the input light is adjusted by the diffractive surfaces firstly, and then reach the receiver side by optical diffraction. The electrical field of the output optical signal is the multiplication of the electrical field of the input signal and that of the transfer function introduced by the diffractive multiplication system. As the diffraction multiplication system can adjust amplitude and phase, multiplication in complex domain is available.

 3) MZI unitary matrix decomposition method: After singular value decomposition (SVD), the weight matrix W is decomposed into two unitary matrices U, V and a diagonal matrix S, which relationship is W=USVH. The minimum size for this scheme is 2 × 2, which is different from the above two schemes. To achieve the intended results, it is necessary to calculate and adjust the phase of each MZI based on the unitary matrices U and V, then modulate the diagonal EOMs according to diagonal matrix S.

Summing multiple multiplication results, linear operators like convolutional operator and MVM operator can be realized. There are mainly four ways to sum the multiplication results:

 1) Wavelength-domain addition: When optical signals of different wavelengths are detected by the same single-end detector, the output current is the addition of the products between the photoelectrical responsivity of the input optical signal wavelengths and their corresponding optical power. Therefore, when the original input signals are modulated by the power of optical signals, the addition can be implemented on wavelength-domain.

 2) Optical-domain addition: When combining different input optical signals into the same waveguide, the output electrical field is the addition of the electrical field of them. If the input optical signals are coherent and own the same phase at the receiver side, the addition of amplitudes of the input optical signals can be implemented by coherent detection.

 3) Electrical-domain addition: When different optical signal is detected and converted to current signals or voltage signals, the output electrical signal is the addition of all the input electrical signals when all the input electrical signals are injected into the same electrical wire.

 4) Time-domain addition: When input electrical signals are distributed in time domain, addition in time domain can be realized by the integration of operational amplifier circuit, which output is the integration of time domain electrical signals of a preset time duration.

In this part, we have briefly reviewed the principles of several common modulation units, detection units, multiplication units, and addition units. These fundamental units can be utilized to realize different optical computing architectures and operators, such as large-scale MVM, convolution, and tensor operators.

3.2 Principle of optical computing architectures

According to the principles of fundamental units above, the optical computing units can be expanded to large-scale computing operators to implement matrix-vector multiplication (MVM) operator, convolution operator or tensor operator. As convolution or tensor operator can be converted into MVM, in this section we will mainly focus on different types of optical MVM realization. In the following part, MVM(m, k, l) represents the MVM operation of weight matrix with size (m, k) and vector with size (k, l).

As shown in Fig.3, the main architectures for large scale MVM can be summarized as follows:

 1) Time-division multiplexing (TDM): A vector of length k is modulated on one EOM at k clock periods, and each column of the weight matrix is modulated on m EOMs simultaneously. Then the output current signal is obtained through detection. The results of MVM can be achieved after k clock periods.

 2) Wavelength-division MVM (WDM): The vector is modulated on k optical carriers through k EOMs, and the weight matrix is modulated on m × k modulators. The output light field signal can be detected by m single-end detectors to obtain the MVM result.

 3) Space-division multiplexing (SDM): Using the spatial dimension, the W matrix are modulated on m × k EOMs and the vector signals are modulated on k EOMs simultaneously. After electrical or optical domain addition, the MVM result can be obtained through the detection units.

 4) Diffractive MVM: Like the principle of diffractive MVM in Fig.2, when the optical signal is transmitted at different locations of the input plane, MVM can be realized at the receiver plane with different transmission coefficients.

 5) Unitary matrix MVM: As mentioned before, through the extension of the unitary matrix architecture of Fig.3(f), complex MVM can be realized in the optical domain. The SVD of W should be performed at first, and then modulate the three matrices U, S, and V on the unitary MZIs, diagonal EOMs, and unitary MZIs respectively.

 6) Polarization-division multiplexing (PDM) and mode-division multiplexing (MDM): PDM and MDM are two important approaches of multiplexing, which can be effectively integrated with other architectures to improve the compute power and parallelism of optical computing. For on-chip optical computing systems, the key to MDM implementation is the design of on-chip mode multiplexer/demultiplexer to ensure low crosstalk and low loss between modes. The key of PDM design lies in the design of polarization beam splitter, polarization beam combiner and polarization rotator to achieve high level mode matching and low polarization loss.

In addition to the above architecture, the convolution of two vectors can be achieved through the optical 4f system [1] as well. However, this scheme requires high output accuracy, and the input dimension is limited, making large-scale production difficult. Special architectures such as reservoir computing architecture can be used to compensate for the nonlinear distortion in optical communication systems. However, the reservoir computing architecture is a dedicated computing architecture and is difficult to meet the universal computing requirements.

3.3 Principle of optical nonlinear transformation units

Although optical computing accelerators have large bandwidth and low energy consumption, most of the architectures mainly focus on linear operations such as MVM or convolution. For applications such as deep optical neural networks (ONN) and reservoir computing (RC), nonlinear transformation of the MVM output is indispensable. Since the optical-electro conversion and electro-optical conversion will introduce non-negligible time delay and energy consumption, development and optimization of nonlinear transformation is vital for an effective optical computing system.

As shown in Fig.4, the mainstream photonics nonlinear core can be summarized as follow:

 1) Photonics crystals: In 1990, Psaltis et al. [27] implemented a holographic optical neural network. The photorefractive crystals are utilized for storing holographic images and implementing nonlinear functionality. This may be the first time which photonics nonlinear-activation core is realized in a neural network. In 2019, Yan et al. [28] utilized ferroelectric thin film SBN:60 as the nonlinear layer in the proposed Fourier-space diffractive deep neural network (F-D2NN). The F-D2NN with nonlinear layer improved the inference accuracy by about 5% compared with that without nonlinear layer.

 2) Optoelectronic hybrid circuit: Combining the electrical circuit and optical devices like MZI, MRR or EAM, the nonlinearity can be realized. In 2019, Fard et al. [29] implemented a photonic-electrical nonlinear circuit in an electrical-optical neural network. A small fraction of input optical signal is converted into electrical signal, then modulated on the residual input optical signal, which realized the nonlinear mapping from input to output without additional drive signals. A similar scheme is proposed in 2020 by Williamson et al. [30].

 3) Electro-optical modulator: The electro-optical modulators have distinctive nonlinear mapping relationship between input and output signals. Therefore, electro-optical modulators are widely utilized in reservoir computing (RC), spiking neuron network and ONN as nonlinear units. For example, the SOA can be utilized as nonlinear units in all-optical RC system [31, 32] and all fiber-optic neural network [33]. Nonlinearity of spiking neuron network (SNN) can be implemented by Fabry−Perot lasers with a saturable absorber (FP-SA) [34, 35]. The PN junction in PD can be treated as a RELU function, which can be utilized for implementing all-optical neural network [36]. Combined with MRR or MZI, germanium-silicon PD can also achieve different nonlinear functions by adjusting the input signal [37, 38]. Furthermore, the MZI can perform different kind of nonlinear activation functions by adjusting the bias voltage [39]. At last, when two signals are modulated on two separated modulators of a coherent system, nonlinear mapping between the output current and input two signals’ amplitude can be realized by coherent detection.

 4) Graphene saturable absorber: By integrating graphene layers on nanophotonic waveguides, a nonlinear mapping relationship between the output optical signal and input optical signal can be achieved. Single layer graphene [40], graphene saturable absorber [41], and graphene/silicon heterojunction are several alternatives to implement optical nonlinear mapping functions [42].

 5) Other nonlinear implementation methods: Zuo et al. [43] developed a two-dimensional magneto-optical trap (MOT) and introduce it into the spatial optical neural network as a nonlinear mapping layer. Besides, anti-saturation absorption of buckyball C60 membrane [44], exciton-polariton condensates are alternatives of optical nonlinear units as well [45].

As mentioned above, there are many schemes for optical nonlinear transformation, which can be integrated in optical NN or scientific computing systems. However, optical nonlinear transformation is still far from being truly deployed on a large scale. For spatial optical computing system, nonlinearity is hard to realize and reconfigure. Metasurface may be a future resolution, which own several advantages such as reconfigurability, nonlinearity and low insertion loss [46]. However, metasurface-based optical computing is still immature, and achieving high-speed reconfigurable metasurfaces remains a significant challenge. For on-chip photonic computing systems, the implementation of nonlinear units such as SOA and graphene require the support of complex process such as heterogeneous integration. Low efficiency and large insertion loss of the nonlinear units restrict the computing scale as well. Therefore, it is crucial to develop reconfigurable and low loss optical nonlinear units. Heterogeneous integration of functional materials on silicon may be a future solution for large scale ONN or scientific computing.

3.4 Discussion

In this section, we have analyzed the implementation principles of photonic computing accelerators from the aspects of modulation, detection, minimum multiplication unit, summation, matrix-vector multiplication architecture, nonlinearity, etc. Afterwards, the comparison between different architectures in terms of the number of devices, the sign of two input numbers, the difficulty of calibration and reconfigurability is demonstrated in Tab.1.

Assuming the MVM size is set to (m, m, l). As Tab.1 shows:

 1) The number of EOMs or lasers in diffractive MVM architecture is the least among the five architectures in this paper. However, additional diffractive planes for weight matrix are indispensable. Among other on-chip architectures, TDM owns the least number of EOMs which at the expense of extra delay.

 2) For architectures which support coherent detection, the calibration difficulty is inevitable. The path length of each branch should be consistent as possible to make the phase different between different branches is 0 or a constant. The scalability of these architectures is influenced by this factor as well. Generally, coherent detection owns higher signal-to-noise ratio (SNR), but the calibration difficulty is high.

 3) All the architectures need to preprocess input data to compensate for the nonlinearity of EOMs. Besides, unitary MVM requires additional SVD of weight matrix.

 4) Reconfiguration of diffractive MVM is hard to realize. Metasurface may be a candidate solution, but it is far from deployment in optical computing system.

Therefore, to realize a complete, efficient, and large-scale photonic computing accelerator, efforts need to be made in multiple directions, such as linear modulation technology and low-loss process development. Due to the deviation between the design of the device and the actual processing technology, the deployment and development of fast and efficient calibration algorithms need to be considered in large-scale commercial applications as well. Based on the current progress, we believe that on-chip integrated photonics architecture is more likely to be used as a co-processor with electrical chips in the future, which could provide low-energy cost and fast computing performance.

4 Applications of optical computing accelerators

As a new computing paradigm, optical computing has shown great advantages in AI inferring and training, combinatorial optimization problem solving, partial differential equation solving, optical channel distortion compensation, and cryptography compared with traditional electrical chips. In this section, we will classify and compare the published optical computing references in two fields: AI and scientific computing.

4.1 Application of optical AI computing

With the advent of large AI models such as chatgpt and SORA, demand for AI computing performance is growing dramatically. However, the huge energy consumption and high cost of GPU make computing resources more and more concentrated in a few leading manufacturers and countries, which greatly limits the popularization and promotion of AI technology. The low energy consumption and potential high compute density for specific computing tasks such as AI inference and combinational optimization problems of optical computing are expected to break this monopoly, which further promoting the development of AI technology. Based on several architectures described in the previous chapters, this chapter will review the current progress in AI inferring and training.

As Fig.5(a1)–(a3) show, by loading data and performing calculation in the time dimension, the number of EOMs can be effectively reduced under certain compute density requirement, thus effectively reduce system complexity and cost. The TDM strategy is compatible with other strategies such as WDM and SDM.

In 2019, Hamerly et al. [47] proposed a TDM-based photonic accelerator which is scalable and low-energy consumption. Simulation results show that the proposed network can reach the “standard quantum limit”, which can realize efficient digit and image classification with an ultralow energy consumption of 50 zJ/MAC. In 2022, Sludds [48] proposed a novel edge computing architecture which based on optical processing network-Netcast. The Netcast enables delocalized analog processing with ultralow optical energy of 40 attojoules per MAC. In 2023, Youngblood [49] proposed a hybrid optical-electro computing architecture by crossbar array architecture and homodyne detection to perform large-scale matrix-matrix multiplication. Simulation results show that the computation efficiency and computational speed can reach 5.8 fJ/MAC and 98 TOPs with 64 × 64 array at 12 GHz modulation speed by using experimentally components, respectively. The energy consumption is about 100 times lower than Nvidia A100 GPU. In 2023, Chen et al. [50] proposed a TDM and SDM hybrid optical computing system, which based on the 3D connectivity and photonic integration of vertical-cavity surface-emitting laser (VCSEL), CMOS drivers and phase mask. The MVM result is achieved by coherent detection and time domain integration. The energy efficiency and compute density are 7 fJ/MAC and 6 TOPs/mm2/s respectively, which representing 100-fold and 20-fold improvement compared with state-of-the-art digital processors. In 2024, Lin et al. [51] proposed a photonic tensor core which consist of two thin-film lithium niobate modulators (TFLN), a III–V photodetector and an electrical integrator. Due to the 60 GHz bandwidth of TFLN, the system can reach a dot product dimension of vector above 106 and a computing rate of 120 GOPs per computation core.

As Fig.5(b1)–(b3) show, SDM is one of the basic ways to realize high-speed parallelization in optical computing accelerators. Combined with other multiplexing strategies, optical computing accelerators can make full use of the advantages of optical computing to realize high computing density and low latency computing systems.

In 2021, Wu et al. [52] demonstrated a 6-bit multimode photonic computing core with an array of programable mode converters which based on on-waveguide metasurface of phase-change material (PCM) Ge2Sb2Te5. In 2022, Ashtiani et al. [53] proposed an on-chip photonic DNN (PDNN) which is ADC-free and performed image classification successfully at under 570 ps latency, which is comparable with only a single clock of state-of-the-art digital platforms. In 2022, Mourgias-Alexandris et al. [54] proposed a noise-resilient optical computing chip with coherent detection. By the proposed noise-resilient training algorithm, 99% prediction accuracy is achieved in MNIST dataset, which is 7% higher than traditional training algorithms. Moreover, the noise standard deviation of inference on MNIST is decreased by 1.8 dB and 4.7 dB at 10 GMAC/sec/axon and 5 GMAC/s/axon respectively. In 2024, Moralis-Pegios et al. [55] proposed a 4 × 4 SiGe EAM-based crossbar architecture and a hardware aware calibration algorithm, which presented an experimental fidelity of 99.997% ± 0.002% of above 10000 linear transformation. In 2024, Li et al. [56] proposed a universal photonic integrated circuit which simulated the interactions in reinforcement learning successfully with an improvement of 56% in efficiency.

As Fig.5(c1) and (c2) show, WDM is one of the most obvious features that distinguish light from electricity. Due to the large number of wavelength channels within the optical bandwidth, optical computing can load input nodes on multiple wavelength channels simultaneously, and each channel can achieve a signal bandwidth of more than 10 GHz, thus achieving a high bandwidth and large-scale photonic computing acceleration system.

In 2021, Xu et al. [57] demonstrated a convolutional optical computing accelerator, which utilize fiber dispersion and WDM to implement image processing and CNN classification tasks. 11.3 TOPS computing speed is achieved by processing with 250 000 pixels simultaneously. In 2022, Xu et al. [58] proposed a photonic tensor flow processor (PTFP) which based on MRR and WDM. At the signal bandwidth of 20 Gbaud, the video action recognition of PTFP achieved an accuracy of 97.9%. In 2022, Bai et al. [59] implemented time-wavelength two-dimensional convolution on a microcomb-driven chip-based photonic processing unit (PPU). The weight precision of 9 bits is achieved by the proposed dedicated control and operation protocol. The PPU demonstrated a digit classification prediction accuracy of 96.6%. In 2023, Meng et al. [60] proposed a compact optical convolutional processing unit which consisted of two multi-mode interferometers and 4 phase shifters. The vector signals are modulated on 4 wavelengths. Ten-classes classification of handwritten digits from the MNIST database is demonstrated by this chip, which realized a prediction accuracy of 92.17%. In 2024, Xu et al. [61] proposed a training approach to enable energy-efficient photonic computing without adding hardware complexity and deployed on a MRR based WDM photonic neural network chip. Experimental results show that the approach reduces the energy consumption by tenfold and realized 4-bit accuracy improvement.

As Fig.5(d1)–(d3) show, the unitary matrix optical computing accelerators can realize complex matrix vector multiplication by light interference and coherence detection. It is the mainstream architecture of early optical computing accelerators and one of the most studied architectures at present.

In 2017, Shen et al. [63] proposed a coherent nanophotonic circuit, which based on MZI unitary matrix architecture. The processor integrated 56 programmable MZIs and show its utility for vowel recognition. In 2019, Pai et al. [64] analyzed the impact of MZI device error on the convergence speed of random stochastic optimization algorithm, and proposed a novel initialization method, which greatly improved the convergence speed. In 2021, Zhang et al. [65] developed an optical neural chip (ONC), which implemented complex-value MVM. The ONC achieved an accuracy of up to 97.4% in Iris classification task and 90.5% accuracy in handwriting recognition task, which shows an 8.5% improvement over the real-valued implementation. In 2023, Pai et al. [66] demonstrated the “in situ backpropagation” algorithm on a four-port MZI unitary matrix based silicon photonic chip, which realized on-chip training of photonic neural network. The MNIST image recognition result of the proposed optical neural network is comparable with that of digital simulation. The algorithm and chip proposed in this article demonstrate the huge potential of AI inference and training. In 2024, Xu et al. [67] designed Taichi, which is a large-scale photonic chip based on D2NN and unitary matrix architecture. The energy efficiency 160.82 TOPS/W and compute density 878.9 TMACs/mm2 are 1000-fold and 600-fold higher than that of NVIDIA GPU H100 respectively.

As Fig.5(e1)–(e3) show, D2NN is a kind of spatial optical computing strategy which introduce phase delay and amplitude change by adjusting spatial light modulators or 3D printing of the diffraction surfaces. Since D2NN is almost all passive diffractive planes, the computing density and energy consumption of D2NN are superior to that of on-chip optical computing strategies and electronic chips.

In 2018, Lin et al. [68] proposed an all-optical D2NN working in THz, which is fabricated by 3D-printing. The experimental results of five-layer D2NN demonstrated an accuracy of 91.75% in handwritten digits classification and an accuracy of 81.13% in fashion MNIST classification, which show the feasibility and potential of D2NN in AI inferring. In 2021, Zhou et al. [69] proposed a large-scale neuromorphic optoelectronic computing chip with a reconfigurable diffraction processing unit, which can be programmed to build different types of neural network architectures. It is an optoelectronic fusion computing architecture based on optical diffraction, which own about 940 000 optoelectronic neurons that use a spatial light modulator to encode 8-bit input data on the phase of the input light. In 2022, Liu et al. [70] reported a programmable AI machine (PAIM), which is based on a multi-layer digital-coding metasurface array. Different from traditional D2NN, the PAIM changes its transfer function with a dynamic modulation range of 35 dB by the integrated amplifier chips and FPGA, which provide a proper option for reconfigurable D2NN implementation. In 2023, Chen et al. [71] proposed an all-analog optical-electro hybrid chip, which achieved an energy efficiency of 74.8 POPs/W and a computing speed of 4.6 POPs, which is more than 3 or 1 order of magnitude higher than the state-of-the-art electrical chips. In 2023, Yuan et al. [72] proposed a dual-neuron optical artificial NN to solve the large-scale training problems in D2NN, which effectively improved the training efficiency and accuracy of D2NN and demonstrated the NN on spatial light modulators based D2NN system.

As Fig.5(f1) and (f2) show, PDM and MDM are multiplexing technologies which have been widely utilized in fiber communication systems to increase the transmission capacity. Compatible with the SDM, WDM architectures, PDM and MDM could provide more degree of freedom and potentially lead to great increase of computing power in the optical computing system.

In 2022, Lin et al. [73] proposed an all-optical implementation of linear transformations which based on polarization-encoded diffractive network. In this work, linear polarizers are placed between isotropic diffractive materials, which could realize different target complex-valued linear transformations. The placement of linear polarizers will not affect the transmission coefficients of each trained diffractive surface. In 2023, Yin et al. [74] demonstrated a WDM-compatible optical mode division multiplexing accelerator, which could realize real-number-field MVM and convolution. The computation parallelism of the WDM approach is increased by 3 times. The proposed system demonstrated a computing density of 1.37 Tops/mm2 and a precision of 5 bits.

4.2 Application of optical scientific computing

Scientific computing is a kind of technology which utilize computer to handle the mathematical computation problems in engineering tasks and scientific research. Scientific computing mainly consists of three steps: i) Establishing a mathematical model. ii) Designing calculation methods. iii) Deploying on the computer. Many scientific computing problems are handled by iterative algorithms or analog algorithm, which do not require high precision, but require high processing speed. Optical computing which owns the advantages of low energy consumption, low latency, and large bandwidth can effectively accelerate the solving of scientific computing problems.

In 2019, Estakhri et al. [75] demonstrated a metamaterial platform which is enabled by inverse-design technology. As Fig.6(a) shows, this platform can efficiently solve generic integral equation at microwave frequencies, which show great potential in implementing chip-scale, fast and integrable optical computing elements. In 2020, Roques-Carmes et al. [76] presented the Photonic Recurrent Ising Sampler (PRIS), which is designed for combinational optimization problems. The latency for PRIS algorithm which deployed on unitary matrix based optical computing accelerator is estimated to be 0.1−1 ns each loop, which is about 50−100 fold lower than that on FPGA. In 2021, Huang et al. [77] developed a silicon photonic-electronic chip for fiber nonlinearity compensation. As Fig.6(b) shows, the experiment is conducted in a 10080 km submarine fiber communication system, which demonstrated considerable Q value improvement compared to 32-bit digital signal processor (fiber nonlinearity compensation). In 2023, Zhang et al. [78] proposed a WDM based blind source separation (BSS) system to eliminate the telecommunication crosstalk. As Fig.6(c) shows, the system executed at a bandwidth of 19.2 GHz and realized a high resolution of 9-bit, which resulting a higher signal-to-interference ratios (SIR) even for ill-conditioned mixtures. In 2023, Pai et al. [79] proposed an optical computing based cryptographic scheme -- LightHash, which is a new option for decentralized blockchain. As Fig.6(d) shows, the LightHash implemented robust, low-bit precision matrix multiplication, which could greatly reduce the energy consumption in the proof-of-work process of blockchain. In 2023, SeyedinNavadeh et al. [80] proposed an optimal orthogonal communication channel detection scheme by photonic processors and scattering optical system. As Fig.6(f) shows, by autonomously performing SVD, lower than −30 dB crosstalk between optimized channels can be achieved, which show potential for applications in multimode optical communication systems. In 2023, Zhang et al. [81] designed an option pricing photonic chip, which greatly speedup the option pricing speed compared to classical Monte Carlo methods. In 2024, Feng et al. [82] proposed an integrated microwave photonics chip (MWP) which based on 4-inch wafer-scale thin-film lithium niobate platform. As Fig.6(e) shows, ultrafast analog computing rate of 256 G/sample is achieved and three applications such as ordinary differential equations solving, ultra-wideband signals generation and images edges detection are demonstrated.

4.3 Discussion

In this chapter, we reviewed representative optical computing accelerators for AI and scientific computing. The comparison of representative optical computing accelerators is illustrated in Tab.2. The architectures based on PCM own the highest compute density and energy efficiency among on-chip optical computing architectures. Spatial optical computing accelerators own higher compute density and energy efficiency compared with on-chip optical computing accelerators, but the reconfigurability and scalability is inferior to that of the on-chip optical computing accelerators.

At the same time, the passive weights of spatial optical computing accelerators are difficult to change, and the current more mature way to achieve weight adjustment through SLM is low in frequency. PCM based MVM architectures own high energy efficiency due to its non-volatile refractive index, but the number of read and writes in PCM is limited, which greatly restricts the service life of PCM based optical chips. Besides, the accuracy of optical computing is generally much lower than that of general electrical chips such as Nvidia A100 and H100. Therefore, optical computing is hard to realize general calculation.

In conclusion, constructing an optical computing accelerator which could beat conventional electrical processor in any metric is unrealistic. Nevertheless, the physics of optical computing gives promise that if optical computing accelerators are carefully designed and work effectively with traditional electrical chips, optical computing can play important roles in many specific tasks, such as AI reasoning, portfolio optimization, and nonlinear distortion compensation in the optical domain.

5 Commercialization of optical computing accelerators

As a novel non-Von Neumann computing paradigm, optical computing owns several advantages like low energy consumption and low latency, which attract the attention of many companies. As Fig.7(a) shows, Israel Lenslet Co. Ltd proposed an optical digital signal processer (EnLightTM) in 2004 [83]. EnLightTM can perform 8 trillion multiply-accumulate operations (MAC) operations per second, and the processing speed is thousands of times that of ordinary digital signal processors. Applications for this processor include high-precision radar, electronic warfare, airport baggage inspection devices, video compression, weather forecasting, and cellular communications base stations. As Fig.7(b) shows, Lightmatter Co. Ltd in America proposed optical-electro hybrid computing chip in HotChips 2020, which is named as MARS and utilized for MVM acceleration [84]. The vector modulation rate of this board can reach 1 GHz, the weight refresh rate reaches 0.26 MHz, the calculation delay is only 200 ps. The MARS is manufactured using the 90 nm standard silicon photonics process, with a chip area of only 150 mm2. In 2021, Lightmatter improved MARS and released a new chip Envise, and built a photoelectric hybrid computing server 4-U, which owns 3 times high instruction execution rate than Nvidia DGX-A100 [85]. As Fig.7(c) shows, a spatial optical computing company named Optalysys Co. Ltd proposed a spatial light and on-chip silicon photonic devices computing acceleration solution in 2020, and used this solution for fully homomorphic encryption (FHE) acceleration in 2023, which could decrease about 90% latency in FHE processing [86]. As shown in Fig.7(d), LightOn Co. Ltd (France) announced that it had successfully integrated its “Appliance” optical processing unit (OPU) on the Jean Zay supercomputer in France in 2021 [87]. This marks the first time that an optoelectronic co-processor has been integrated into a high-performance computing (HPC) scenario. As shown in Fig.7(e), Photoncounts Co. Ltd (China) carried out preliminary verification of optical computing and launched the “photoelectric fusion AI acceleration computing card” Li-SEE R1 in 2020 [88]. The Li-SEE R1 can process 36 channels of 1080P video simultaneously with 70 W power consumption. Under mixed precision computing of low precision optical computing and high precision electrical computing, the Li-SEE R1 can realize 20 TOPs compute performance. As shown in Fig.7(f), Lightelligence Co. Ltd (China) released its first generation of fully integrated optoelectronic hybrid computing platform - Photonic Arithmetic Computing Engine (PACE) in 2021 [89]. The PACE integrated 12000 discrete silicon photonics devices with 1 GHz vector update rate, 150 ps latency and 1 MHz weight update rate. Due the latency and computing density advantages, PACE has demonstrated a hundred times advantages over RTX-4090 of Nvidia in combination optimization problems such as Max-Cut, Boolean satisfiability. As shown in Fig.7(g), Neurophos Co. Ltd (America) demonstrated its revolutional optical processing unit (OPU) in 2023 [90]. Combining metasurface with silicon photonics devices, the OPU can achieve true 160000 TOPs computing capability and 300 TOPs/W computing density, which is 100 times faster and more efficient than the leading GPUs.

In the past three years, more and more companies have invested in the development and commercialization of optical computing accelerators. Many prototypes have demonstrated computing density and power consumption advantages over traditional GPUs. However, optical computing is still some distance away from commercialization. The main reasons are as follow: i) Optical computing is an analog computing technology, which is quite different from the current digital computing paradigm. To be compatible with digital chips, optical computing will sacrifice a lot of energy consumption and latency advantages complete the conversion from digital to analog and from analog to digital. ii) Although optical computing can achieve hundreds or thousands of times latency advantages compared to GPUs in some specialized computing fields, its accuracy is limited and it cannot fulfill many high-precision general computing tasks, thus limiting its application scenarios. iii) The computing principle of optical computing is quite different from that of digital computing. Therefore, adaptive algorithms and software ecosystems for optical computing are necessary. This is an important step towards the commercialization of optical computing. However, the software ecosystems of optical computing are still a long way from true commercialization.

Although optical computing still faces many commercialization issues, these problems can be solved with the increase of investment and the popularization of optical computing. With the exponential growth of the scale of AI large model parameters and scientific computing problems, optical computing with advantages of low energy consumption and low latency will become an important way to achieve the growth of computation performance in the future.

6 Challenges and perspectives

In the past 5 years, optical computing is flourishing both academically and commercially. Many academic achievements and commercial prototypes are proposed. However, large-scale commercial deployment of optical computing accelerators is still restricted by the following factors:

Preprocessing delay: The mapping relationship from current/voltage to optical signal is nonlinear. For example, the relationship between output optical amplitude and input voltage of Mach–Zender modulator is sine/cosine function, and the relationship between input voltage and output optical power attenuation amplitude of voltage-controlled optical attenuator is exponential. These nonlinear transformations require pre-equalization or precoding for the input electrical signal, which introduce extra preprocessing latency into the optical computing system.

 1) Large-scale calibration: When the scale of the optical computing chip is large, the difficulty of calibration increases dramatically. Moreover, the fabrication error influences the calibration effect, which deteriorates the system performance.

 2) Device size: Due to the large size of the electro-optical modulator, its overall integration (the number of devices per unit area) is more than one order of magnitude smaller than that of the digital electric integrated circuit. Assuming that the chip side length is L = 5 cm, the chip area is 5 cm × 5 cm, and the wavelength is λ = 500 nm, then the ideal maximum integration of the photoelectric hybrid computing chip is (L/λ)2 = 1010, which is an order of magnitude smaller than the transistor integration of electrical 2.5×1011 [7]. Therefore, the integration of light is difficult to compare with electricity. Furthermore, due to the large size of optical devices, the scalability is a big challenge for optical computing accelerators as well.

 3) Noise intensity increases linearly with bandwidth: Although the bandwidth of detectors such as EOMs and PDs can reach tens of times that of traditional digital chips, the relative intensity noise of lasers, shot noise of photodiodes and thermal noise increases linearly with the increase of sampling rate, which will affect the accuracy of the photoelectric hybrid computing system.

To overcome the above difficulties, effort can be made in the following aspects:

 1) New architecture design: Most of the existing optical computing architectures focus on improving the energy efficiency and compute density of optical computing chips, but ignore the combination of multiple dimensions in optical signals. Through the optimization in optical computing architecture, such as WDM and PCM, WDM and SCM, even on-chip and spatial hybrid optical computing accelerators, multi-dimensions of optical signal can be utilized to boost the overall system performance.

 2) Development of new devices and new materials: Due to the inherent micrometer-level size of photoelectrical devices, the integration of the optical computing is much less than that of the electrical computing system. In photonic computing systems, an electrical signal is typically modulated into light via EOM to achieve gigahertz computation clocks. The electro-optic coefficient of the material employed in a modulator is a crucial determinant of its modulation efficiency. Specifically, a larger electro-optic coefficient enables a modulator to more readily realize a π phase shift over a shorter length under a fixed electrical field. Hence, if small-footprint electro-optic modulators based on materials with high electro-optic coefficients, such as BaTiO3 [91] and plasmonic-organic hybrid (POH) [92], are utilized, they can diminish the size of the optical multiplication unit composed of electro-optic modulators, thereby enhancing the overall integration density of photonic chips.

  Optical computing adaptive analog circuit design: Different from traditional electrical calculation, optical computing needs to consider the nonlinear, noise and calculation error caused by system jitter between electro-optical conversion and photoelectric conversion. The SNR and accuracy of the output signal of the system can be improved effectively by designing a pre-coded circuit for different modulation curves and a low noise amplifier receiving circuit with adjustable gain. For devices requiring precise temperature control, such as MRRs, a low power temperature feedback control circuit can be designed to achieve a stable optical computing system and improve the output SNR of the optical computing system. Therefore, the analog circuit design for optical computing system is an important part of improving the optical computing system performance.

 3) Optical computing adaptive algorithms design: As optical computation is analog computation, it is different from the traditional digital computer architecture, so it is necessary to design the corresponding optimization algorithm according to the specific applications, devices and architectures. For example, forward and backpropagation algorithms of neural networks such as in-situ algorithms have been shown to be efficient in D2NN and unitary matrix optical computing systems. Consequently, development of optical computing system accommodated algorithms is the field that needs to focus in the future of optical computing.

The advantages of optical computing are mainly reflected by the large bandwidth and low energy consumption of optical devices. However, the implementation and improvement of optical computing accelerators inevitably require the collaborative design of driving circuits, algorithms, system optimization and so on. To effectively relax the power of optical computing, co-optimization of architecture, devices, algorithms and peripheral analog and digital circuits are necessary.

7 Conclusion

In this review, we systematically analyze the basic principles of electric computing accelerators and optical computing accelerators, as well as their most potential applications, and review the current development of optical computing commercialization. Fundamentally, we compare the energy efficiency ratio and computing power density between different optical computing architectures. Spatial optical computing accelerators have a high computational power density, and on-chip light computing accelerators have advantages such as reconfigurability and high integration. Although the accuracy, chip stability of optical computing accelerators is still a large gap with the traditional electrical accelerators, with the explosive growth of the AI industry and the increasing demand for energy, optical computing with the advantages of high compute density and low energy consumption will hopefully play an important role in AI inferring and high compute density requirement scenarios.

References

[1]

Y. Ma, Z. Wang, H. Yang, and L. Yang, Artificial intelligence applications in the development of autonomous vehicles: A survey, IEEE/CAA Journal of Automatica Sinica 7, 315 (2020)

[2]

E. J. Topol, High-performance medicine: The convergence of human and artificial intelligence, Nat. Med. 25(1), 44 (2019)

[3]

H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Van Katwyk, A. Deac, A. Anandkumar, K. Bergen, C. P. Gomes, S. Ho, P. Kohli, J. Lasenby, J. Leskovec, T. Y. Liu, A. Manrai, D. Marks, B. Ramsundar, L. Song, J. Sun, J. Tang, P. Veličković, M. Welling, L. Zhang, C. W. Coley, Y. Bengio, and M. Zitnik, Scientific discovery in the age of artificial intelligence, Nature 620(7972), 47 (2023)

[4]

J. Achiam,S. Adler,S. Agarwal, ., GPT-4 technical report, arXiv: 2303.08774 (2023)

[5]

D. Kimovski, N. Saurabh, M. Jansen, A. Aral, A. Al-Dulaimy, A. B. Bondi, A. Galletta, A. V. Papadopoulos, A. Iosup, and R. Prodan, Beyond von Neumann in the computing continuum: Architectures, applications, and future directions, IEEE Internet Comput. 28(3), 6 (2024)

[6]

N. G. Orji, M. Badaroglu, B. M. Barnes, C. Beitia, B. D. Bunday, U. Celano, R. J. Kline, M. Neisser, Y. Obeng, and A. E. Vladar, Metrology for the next generation of semiconductor devices, Nat. Electron. 1(10), 532 (2018)

[7]

P. L. McMahon, The physics of optical computing, Nat. Rev. Phys. 5, 717 (2023)

[8]

W. Deal,K. Leong,W. Yoshida,A. Zamora,X. Mei, InP HEMT integrated circuits operating above 1, 000 GHz, in: 2016 IEEE International Electron Devices Meeting (IEDM), 29-1, IEEE, 2016

[9]

F. Thome and A. Leuther, First demonstration of distributed amplifier MMICs with more than 300-GHz bandwidth, IEEE J. Solid-State Circuits 56(9), 2647 (2021)

[10]

R. Ho, K. W. Mai, and M. A. Horowitz, The future of wires, Proc. IEEE 89(4), 490 (2001)

[11]

R. K. Cavin and J. L. Hilbert, Design of integrated circuits: Directions and challenges, Proc. IEEE 78(2), 418 (1990)

[12]

N. F. Tyndall, ., A low-loss, broadband, nitride-only photonic integrated circuit platform, in: Quantum 2.0, QTu4B-5, Optica Publishing Group, 2022

[13]

D. A. B. Miller, Attojoule optoelectronics for low-energy information processing and communications, J. Lightwave Technol. 35(3), 346 (2017)

[14]

H. Xi, C. Li, J. Chen, and J. Zhu, Training transformers with 4-bit integers, Adv. Neural Inf. Process. Syst. 36, 49146 (2023)

[15]

X. Sun, N. Wang, C. Y. Chen, . Ultra-low precision 4-bit training of deep neural networks, Adv. Neural Inf. Process. Syst. 33, 1796 (2020)

[16]

[17]

[18]

D. Zuras,M. Cowlishaw,A. Aiken, ., IEEE standard for floating-point arithmetic, IEEE Std. 2008(754), 1 (2008)

[19]

[20]

A. Shahid,M. Mushtaq, A Survey Comparing Specialized Hardware and Evolution in TPUs for Neural Networks, in: 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 2020, pp 1–6

[21]

R. Vuduc,A. Chandramowlishwaran,J. Choi, M. Guney,A. Shringarpure, On the limits of GPU acceleration, in: Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (Vol. 13), 2010

[22]

[23]

P. Hill, ., DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and ear-compute data fission, in: 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 786–799, IEEE, 2017

[24]

J. You,J. W. Chung,M. Chowdhury, ZEUS: Understanding and optimizing GPU energy consumption of DNN training, in: 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), 2023, pp 119–139

[25]

[26]

Z. Hu, S. Li, R. L. T. Schwartz, M. Solyanik-Gorgone, M. Miscuglio, P. Gupta, and V. J. Sorger, High-throughput multichannel parallelized diffraction convolutional neural network accelerator, Laser Photonics Rev. 16(12), 2200213 (2022)

[27]

D. Psaltis, D. Brady, X. G. Gu, and S. Lin, Holography in artificial neural networks, Nature 343(6256), 325 (1990)

[28]

T. Yan, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, Fourier-space diffractive deep neural network, Phys. Rev. Lett. 123(2), 023901 (2019)

[29]

M. M. P. Fard, I. A. D. Williamson, M. Edwards, K. Liu, S. Pai, B. Bartlett, M. Minkov, T. W. Hughes, S. Fan, and T. A. Nguyen, Experimental realization of arbitrary activation functions for optical neural networks, Opt. Express 28(8), 12138 (2020)

[30]

I. A. D. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, Reprogrammable electro-optic nonlinear activation functions for optical neural networks, IEEE J. Sel. Top. Quantum Electron. 26(1), 1 (2020)

[31]

F. Duport, B. Schneider, A. Smerieri, M. Haelterman, and S. Massar, All-optical reservoir computing, Opt. Express 20(20), 22783 (2012)

[32]

A. Dejonckheere, F. Duport, A. Smerieri, L. Fang, J. L. Oudar, M. Haelterman, and S. Massar, All-optical reservoir computer based on saturation of absorption, Opt. Express 22(9), 10868 (2014)

[33]

M. T. Hill, E. E. E. Frietman, H. de Waardt, Giok-djan Khoe, and H. J. S. Dorren, All fiber-optic neural network using coupled SOA based ring lasers, IEEE Trans. Neural Netw. 13(6), 1504 (2002)

[34]

S. Xiang, Y. Shi, X. Guo, Y. Zhang, H. Wang, D. Zheng, Z. Song, Y. Han, S. Gao, S. Zhao, B. Gu, H. Wang, X. Zhu, L. Hou, X. Chen, W. Zheng, X. Ma, and Y. Hao, Hardware-algorithm collaborative computing with photonic spiking neuron chip based on an integrated Fabry–Perot laser with a saturable absorber, Optica 10(2), 162 (2023)

[35]

D. Zheng, S. Xiang, X. Guo, Y. Zhang, B. Gu, H. Wang, Z. Xu, X. Zhu, Y. Shi, and Y. Hao, Experimental demonstration of coherent photonic neural computing based on a Fabry–Perot laser with a saturable absorber, Photon. Res. 11(1), 65 (2023)

[36]

F. Ashtiani, A. J. Geers, and F. Aflatouni, An on-chip photonic deep neural network for image classification, Nature 606(7914), 501 (2022)

[37]

Y. Huang, W. Wang, L. Qiao, X. Hu, and T. Chu, Programmable low-threshold optical nonlinear activation functions for photonic neural networks, Opt. Lett. 47(7), 1810 (2022)

[38]

H. Li,B. Wu,W. Tong,J. Dong,X. Zhang, All-optical nonlinear activation function based on germanium silicon hybrid asymmetric coupler, IEEE J. Sel. Top. Quantum Electron. 29(2), 1 (2023)

[39]

Q. Li,S. Liu,Y. Zhao,W. Wang,Y. Tian, J. Feng,J. Guo, Optical nonlinear activation functions based on MZI-structure for optical neural networks, in: 2020 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC), Beijing, China, Oct. 24−27, 2020, ACP/IPOC, 2020, pp 1–3

[40]

K. Liao, C. Li, T. Dai, C. Zhong, H. Lin, X. Hu, and Q. Gong, Matrix eigenvalue solver based on reconfigurable photonic neural network, Nanophotonics 11(17), 4089 (2022)

[41]

Z. Cheng, H. K. Tsang, X. Wang, K. Xu, and J. B. Xu, In-plane optical absorption and free carrier absorption in graphene-on-silicon waveguides, IEEE J. Sel. Top. Quantum Electron. 20(1), 43 (2014)

[42]

C. Zhong, K. Liao, T. Dai, M. Wei, H. Ma, J. Wu, Z. Zhang, Y. Ye, Y. Luo, Z. Chen, J. Jian, C. Sun, B. Tang, P. Zhang, R. Liu, J. Li, J. Yang, L. Li, K. Liu, X. Hu, and H. Lin, Graphene/silicon heterojunction for reconfigurable phase-relevant activation function in coherent optical neural networks, Nat. Commun. 14(1), 6939 (2023)

[43]

Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y. C. Chen, P. Chen, G. B. Jo, J. Liu, and S. Du, All-optical neural network with nonlinear activation functions, Optica 6(9), 1132 (2019)

[44]

M. Miscuglio, A. Mehrabian, Z. Hu, S. I. Azzam, J. George, A. V. Kildishev, M. Pelton, and V. J. Sorger, All-optical nonlinear activation function for photonic neural networks, Opt. Mater. Express 8(12), 3851 (2018)

[45]

D. Ballarini, A. Gianfrate, R. Panico, A. Opala, S. Ghosh, L. Dominici, V. Ardizzone, M. De Giorgi, G. Lerario, G. Gigli, T. C. H. Liew, M. Matuszewski, and D. Sanvitto, Polaritonic neuromorphic computing outperforms linear classifiers, Nano Lett. 20(5), 3506 (2020)

[46]

C. Liu,Q. Ma,Z. J. Luo,Q. R. Hong,Q. Xiao, H. C. Zhang,L. Miao,W. M. Yu, Q. Cheng,L. Li,T. J. Cui, A programmable diffractive deep neural network based on a digital-coding metasurface array, Nat. Electron. 5(2), 113 (2022)

[47]

R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, Large-scale optical neural networks based on photoelectric multiplication, Phys. Rev. X 9(2), 021032 (2019)

[48]

A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, J. Cochrane, L. Bernstein, D. Bunandar, P. B. Dixon, S. A. Hamilton, M. Streshinsky, A. Novack, T. Baehr-Jones, M. Hochberg, M. Ghobadi, R. Hamerly, and D. Englund, Delocalized photonic deep learning on the internet’s edge, Science 378(6617), 270 (2022)

[49]

N. Youngblood, Coherent photonic crossbar arrays for large-scale matrix–matrix multiplication, IEEE J. Sel. Top. Quantum Electron. 29(2), 1 (2023)

[50]

Z. Chen, A. Sludds, R. III Davis, I. Christen, L. Bernstein, L. Ateshian, T. Heuser, N. Heermeier, J. A. Lott, S. Reitzenstein, R. Hamerly, and D. Englund, Deep learning with coherent VCSEL neural networks, Nat. Photonics 17(8), 723 (2023)

[51]

Z. Lin,B. J. Shastri,S. Yu,J. Song,Y. Zhu, A. Safarnejadian,W. Cai,Y. Lin, W. Ke,M. Hammood,T. Wang,M. Xu,Z. Zheng, M. Al-Qadasi,O. Esmaeeli,M. Rahim, G. Pakulski,J. Schmid,P. Barrios,W. Jiang,H. Morison,M. Mitchell,X. Guan,N. A. F. Jaeger,L. A. Rusch,S. Shekhar,W. Shi,S. Yu, X. Cai,L. Chrostowski, 120 GOPS Photonic tensor core in thin-film lithium niobate for inference and in situ training, Nat. Commun. 15(1), 9081 (2024)

[52]

C. Wu, H. Yu, S. Lee, R. Peng, I. Takeuchi, and M. Li, Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network, Nat. Commun. 12(1), 96 (2021)

[53]

F. Ashtiani, A. J. Geers, and F. Aflatouni, An on-chip photonic deep neural network for image classification, Nature 606(7914), 501 (2022)

[54]

G. Mourgias-Alexandris, M. Moralis-Pegios, A. Tsakyridis, S. Simos, G. Dabos, A. Totovic, N. Passalis, M. Kirtas, T. Rutirawut, F. Y. Gardes, A. Tefas, and N. Pleros, Noise-resilient and high-speed deep learning with coherent silicon photonics, Nat. Commun. 13(1), 5572 (2022)

[55]

M. Moralis-Pegios, G. Giamougiannis, A. Tsakyridis, D. Lazovsky, and N. Pleros, Perfect linear optics using silicon photonics, Nat. Commun. 15(1), 5468 (2024)

[56]

X. K. Li, J. X. Ma, X. Y. Li, J. J. Hu, C. Y. Ding, F. K. Han, X. M. Guo, X. Tan, and X. M. Jin, High-efficiency reinforcement learning with hybrid architecture photonic integrated circuit, Nat. Commun. 15(1), 1044 (2024)

[57]

X. Xu,M. Tan,B. Corcoran,J. Wu,A. Boes, T. G. Nguyen,S. T. Chu,B. E. Little,D. G. Hicks,R. Morandotti,A. Mitchell,D. J. Moss, 11 TOPS photonic convolutional accelerator for optical neural networks, Nature 589(7840), 44 (2021)

[58]

S. Xu, J. Wang, S. Yi, and W. Zou, High-order tensor flow processing using integrated photonic circuits, Nat. Commun. 13(1), 7970 (2022)

[59]

B. Bai, Q. Yang, H. Shu, L. Chang, F. Yang, B. Shen, Z. Tao, J. Wang, S. Xu, W. Xie, W. Zou, W. Hu, J. E. Bowers, and X. Wang, Microcomb-based integrated photonic processing unit, Nat. Commun. 14(1), 66 (2023)

[60]

X. Meng, G. Zhang, N. Shi, G. Li, J. Azaña, J. Capmany, J. Yao, Y. Shen, W. Li, N. Zhu, and M. Li, Compact optical convolution processing unit based on multimode interference, Nat. Commun. 14(1), 3000 (2023)

[61]

T. Xu, W. Zhang, J. Zhang, Z. Luo, Q. Xiao, B. Wang, M. Luo, X. Xu, B. J. Shastri, P. R. Prucnal, and C. Huang, Control-free and efficient integrated photonic neural networks via hardware-aware training and pruning, Optica 11(8), 1039 (2024)

[62]

J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, J. Liu, C. D. Wright, A. Sebastian, T. J. Kippenberg, W. H. P. Pernice, and H. Bhaskaran, Parallel convolutional processing using an integrated photonic tensor core, Nature 589(7840), 52 (2021)

[63]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, Deep learning with coherent nanophotonic circuits, Nat. Photonics 11(7), 441 (2017)

[64]

S. Pai, B. Bartlett, O. Solgaard, and D. A. B. Miller, Matrix optimization on universal unitary photonic devices, Phys. Rev. Appl. 11(6), 064044 (2019)

[65]

H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, An optical neural chip for implementing complex-valued neural network, Nat. Commun. 12(1), 457 (2021)

[66]

S. Pai, Z. Sun, T. W. Hughes, T. Park, B. Bartlett, I. A. D. Williamson, M. Minkov, M. Milanizadeh, N. Abebe, F. Morichetti, A. Melloni, S. Fan, O. Solgaard, and D. A. B. Miller, Experimentally realized in situ backpropagation for deep learning in photonic neural networks, Science 380(6643), 398 (2023)

[67]

Z. Xu, T. Zhou, M. Ma, C. C. Deng, Q. Dai, and L. Fang, Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence, Science 384(6692), 202 (2024)

[68]

X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, All-optical machine learning using diffractive deep neural networks, Science 361(6406), 1004 (2018)

[69]

T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit, Nat. Photonics 15(5), 367 (2021)

[70]

C. Liu,Q. Ma,Z. J. Luo,Q. R. Hong,Q. Xiao, H. C. Zhang,L. Miao,W. M. Yu, Q. Cheng,L. Li,T. J. Cui, A programmable diffractive deep neural network based on a digital-coding metasurface array, Nat. Electron. 5(2), 113 (2022)

[71]

Y. Chen, M. Nazhamaiti, H. Xu, Y. Meng, T. Zhou, G. Li, J. Fan, Q. Wei, J. Wu, F. Qiao, L. Fang, and Q. Dai, All-analog photoelectronic chip for high-speed vision tasks, Nature 623(7985), 48 (2023)

[72]

X. Yuan, Y. Wang, Z. Xu, T. Zhou, and L. Fang, Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning, Nat. Commun. 14(1), 7110 (2023)

[73]

J. Li, Y. C. Hung, O. Kulce, D. Mengu, and A. Ozcan, Polarization multiplexed diffractive computing: All-optical implementation of a group of linear transformations through a polarization-encoded diffractive network, Light Sci. Appl. 11(1), 153 (2022)

[74]

R. Yin, H. Xiao, Y. Jiang, X. Han, P. Zhang, L. Chen, X. Zhou, M. Yuan, G. Ren, A. Mitchell, and Y. Tian, Integrated WDM-compatible optical mode division multiplexing neural network accelerator, Optica 10(12), 1709 (2023)

[75]

N. M. Estakhri, B. Edwards, and N. Engheta, Inverse-designed metastructures that solve equations, Science 363(6433), 1333 (2019)

[76]

C. Roques-Carmes, Y. Shen, C. Zanoci, M. Prabhu, F. Atieh, L. Jing, T. Dubček, C. Mao, M. R. Johnson, V. Čeperić, J. D. Joannopoulos, D. Englund, and M. Soljačić, Heuristic recurrent algorithms for photonic Ising machines, Nat. Commun. 11(1), 249 (2020)

[77]

C. Huang,S. Fujisawa,T. F. de Lima,A. N. Tait,E. C. Blow, Y. Tian,S. Bilodeau,A. Jha,F. Yaman,H. T. Peng, H. G. Batshon,B. J. Shastri,Y. Inada, T. Wang,P. R. Prucnal, A silicon photonic–electronic neural network for fibre nonlinearity compensation, Nat. Electron. 4(11), 837 (2021)

[78]

W. Zhang, A. Tait, C. Huang, T. Ferreira de Lima, S. Bilodeau, E. C. Blow, A. Jha, B. J. Shastri, and P. Prucnal, Broadband physical layer cognitive radio with an integrated photonic processor for blind source separation, Nat. Commun. 14(1), 1107 (2023)

[79]

S. Pai, T. Park, M. Ball, B. Penkovsky, M. Dubrovsky, N. Abebe, M. Milanizadeh, F. Morichetti, A. Melloni, S. Fan, O. Solgaard, and D. A. B. Miller, Experimental evaluation of digitally verifiable photonic computing for blockchain and cryptocurrency, Optica 10(5), 552 (2023)

[80]

S. M. SeyedinNavadeh, M. Milanizadeh, F. Zanetto, G. Ferrari, M. Sampietro, M. Sorel, D. A. B. Miller, A. Melloni, and F. Morichetti, Determining the optimal communication channels of arbitrary optical systems using integrated photonic processors, Nat. Photonics 18(2), 149 (2024)

[81]

H. Zhang, L. Wan, S. Ramos-Calderer, Y. Zhan, W. K. Mok, H. Cai, F. Gao, X. Luo, G. Q. Lo, L. C. Kwek, J. I. Latorre, and A. Q. Liu, Efficient option pricing with a unary-based photonic computing chip and generative adversarial learning, Photon. Res. 11(10), 1703 (2023)

[82]

H. Feng, T. Ge, X. Guo, B. Wang, Y. Zhang, Z. Chen, S. Zhu, K. Zhang, W. Sun, C. Huang, Y. Yuan, and C. Wang, Integrated lithium niobate microwave photonic processing engine, Nature 627(8002), 80 (2024)

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

C. Xiong, W. H. P. Pernice, J. H. Ngai, J. W. Reiner, D. Kumah, F. J. Walker, C. H. Ahn, and H. X. Tang, Active silicon integrated nanophotonics: Ferroelectric BaTiO3 devices, Nano Lett. 14(3), 1419 (2014)

[92]

A. Melikyan, K. Koehnle, M. Lauermann, R. Palmer, S. Koeber, S. Muehlbrandt, P. C. Schindler, D. L. Elder, S. Wolf, W. Heni, C. Haffner, Y. Fedoryshyn, D. Hillerkuss, M. Sommer, L. R. Dalton, D. Van Thourhout, W. Freude, M. Kohl, J. Leuthold, and C. Koos, Plasmonic-organic hybrid (POH) modulators for OOK and BPSK signaling at 40 Gbit/s, Opt. Express 23(8), 9938 (2015)

[93]

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (7951KB)

7832

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/