Introduction
Radiation therapy (RT) is used to treat over 60% of cancer patients in the US and 30% of cancer patients in China. The oncological use of radiation to treat malignancies started immediately after the discovery of radioactive isotopes. In the first half-century of RT technological development, most of the research and development effort has been allocated to making high-energy and highly penetrating radiation sources available for the treatment of deep tumors. In the past 40 years, engineering has played an important role in RT technologies. The change has been largely driven by the availability of three-dimensional (3D) images for treatment planning. 3D images from modalities, such as computed tomography (CT) or magnetic resonance (MR), provide quantitative delineation of the tumor target and organs at risk (OARs). Such images also support accurate 3D dose calculation, which in combination with the organ delineation provides a wealth of statistical information to correlate with the tumor control probability and normal organ toxicity—this knowledge quantum-leaped modern RT to be a quantitative science. With decades of technological evolution, modern RT workflow can be simplified into a flowchart (Fig. 1). The 3D images are first acquired for a RT patient. Then, the gross tumor volume (GTV) and OARs are delineated on 3D images. GTV describes visible tumors based on medical images. This variable is subsequently expanded as clinical and planning target volumes to account for microscopic tumor infiltration and geometrical uncertainties. A prescription reflecting the oncologist’s intent and prevailing dose constraints is used to guide treatment planning, which can be forward or inverse to use a combination of beams and dose modulation device to approach the prescription. A typical dose modulation device is a multileaf collimator (MLC) that defines both the aperture shape and relative fluences within the beam aperture. Once the treatment plan is created, approved, and validated, the patient is scheduled for treatment. During treatment in modern RT settings, additional images are obtained to localize the target for registration and patient positioning. The treatment is then delivered. A number of key technological advances in the process need a brief definition to facilitate the main topic of this review article, which is artificial intelligence (AI) in RT.
Image-guided RT
Although 3D imaging has been used to define the target and OARs without precisely positioning the patient and following the treatment plan, the treatment cannot be accurately delivered to achieve the intended effects. 3D treatment imaging has not been incorporated in the RT treatment until less than 20 years ago when a flat panel imager and a kV X-ray source were installed on the clinical linac gantry for volumetric image acquisition [
1]. The result is 3D cone-beam CT (CBCT) images that provide considerably superior internal anatomy localization than the 2D radiograph that predated the CBCT. MR-guided RT (MRgRT) [
2] has overcome the deficiencies of CBCT-guided RT. Compared with CBCT, MR has superior image contrast for soft tissues, is ionizing-radiation-free, and flexible in the imaging plane orientation. CBCT-guided RT and MRgRT have the potential to not only assist patient positioning but also support the adaptation of a treatment plan in the case of non-rigid anatomical and physiological changes.
Intensity-modulated RT (IMRT)
IMRT is another important technical breakthrough in RT. This technique was originally invented to compensate for the dose drop off toward the edges of the target when uniform conformal beams are used [
3–
5]. A mathematical tool known as inverse optimization, in combination with the abovementioned MLC, can be applied to RT planning to create complex or concave dose distributions that match the tumor shape and avoid the surrounding critical organs [
6]. This breakthrough enabled dose carving for concave and complex geometries that are unattainable with 3D conformal RT.
Robotic RT platform
On the hardware end, since the linear accelerator became a clinical utility, despite the expanding imaging and targeting functionalities, the architecture of RT machines has remained largely unchanged. The X-ray source revolves around a fixed axis to provide coplanar RT. The degrees of freedom of patient couch and a robotic gantry have been used to accommodate the need for delivering non-coplanar beams [
7–
15]. Mathematical algorithms were developed to utilize the enormous delivery space for superior dosimetry effectively [
12,
14–
16].
Regardless of the specific delivery platforms, modern RT follows a specific workflow (Fig. 1). With the increased complexity of treatment and the goal to achieve a more effective RT, machine learning and AI have played increasingly important roles in this treatment. This review provides an overview of the opportunities for AI in each step of RT (Table 1).
AI for image acquisition
Low-dose CT acquisition
The signal-to-noise ratio (SNR) is proportional to the CT imaging dose and should be as low as reasonably achievable. Reducing the imaging dose is important for repeated daily image guidance and screening [
17]. Conventional filtered back projection is susceptible to the lack of sufficient photon counts and results in severe artifacts and noise from low-dose CT projection. An alternative method using iterative reconstruction with a fidelity term was developed to minimize the difference between the actual and forward projection data of the current estimation of CT. A regularization term is added to the optimization problem to mitigate the noise and artifacts from the ill-conditioned low-dose CT problem and exploit known anatomical characteristics. A typical term is total variation that exploits the piecewise smoothness of an anatomical structure. Iterative CT reconstruction with regularization terms has achieved remarkable success in various applications [
18–
29]. However, the regularization term not only introduces a statistical bias but also compromises the CT resolution in exchange for noise and artifact suppression [
30]. Recently, deep-learning methods have been used to reconstruct low-dose CT images with remarkable success [
18,
20,
21,
31–
37]. Deep learning was developed from artificial neural networks (ANNs), which mimic the information transfer and processing of biological systems. However, deep-learning neural networks differ from ANNs in terms of the depths of network layers and the capability to learn high-level features on their own. A representative deep-learning neural network is a convolutional neural network (CNN) that has input and output layers with hidden layers in between. Each hidden layer is connected with its adjacent layers by convolutional operations for hierarchical feature extraction. Deep learning has unprecedented versatility to learn high-level features in complex systems and perform tasks, including classification and prediction of new cases. Chen
et al. shows a combined autoencoder and CNN approach for low-dose CT reconstruction [
36]. In this method, patches with a fixed size are extracted from paired low and normal-dose CT images. The patches are transferred to the feature space in fully connected convolutional layers with the rectified linear unit activation function. In this process, image noise is suppressed. In the decoder step, deconvolutional layers are used to recover image details from the extracted features. Residual compensation is used to enhance the details. Consequently, effective noise suppression, structure preservation, and lesion detection are reported using the deep-learning method. A challenge of deep-learning method is that sufficient training data may not be readily available; however, training of a data set with substantially different imaging characteristics can result in undesired distortion in the reconstructed images. A solution to combining the strengths of traditional iterative methods and the deep-learning method is plug-and-play alternating direction method of multiplier, where the regularization term is replaced with an off-the-shelf denoiser, such as block-matching and 3D filtering [
38,
39] or a pretrained deep-learning neural network [
20].
CBCT artifact correction
Compared with fan-beam CT that is used for diagnostic and simulation image acquisition, the CBCT used for image-guided RT creates additional challenges in image quality. Given the severe scatter X-rays, photon starvation artifacts, and the motion artifacts from slow acquisition, CBCT image quality is substantially inferior than that of the fan-beam CT, showing poorer contrast and inaccurate electron density for dose calculation. The distortion of CT number is referred to as the shading artifact. In addition to the anti-scatter grid for reducing the scatter X-rays [
40–
47], hardware blockers and computational methods have been utilized to estimate the scatter photon components with varying levels of success [
42–
46,
48–
84]. Blocker methods require modification of the existing CBCT systems and may be impractical. In the computational approach, the scatter photons at the detector are estimated using Monte Carlo forward projection [
52]. Alternatively, the scatter component can be estimated using analytical methods [
85,
86]. Despite software and hardware acceleration, the amount of computation needed to estimate the scatter component is prohibitive for online CBCT reconstruction. The emergence of deep-learning neural networks provides a potential solution to this problem and is effective in improving CBCT quality [
87–
91]. Fig. 2 shows a deep residual convolution neural network (ResNet) that learns the shading compensation map for scatter correction. ResNet with residual blocks to skip layers effectively mitigates the vanishing gradient problem commonly observed in training a CNN with many hidden layers. Additional shortcut connections between the input and output layers facilitate the backpropagation of the gradient. Paired CT images with/without shading artifacts are used to train the network, which then corrects the images reconstructed without scatter correction. Compared with the iterative reconstruction method, trained deep-learning neural networks are more efficient to use. The efficiency gains are essential for online image reconstruction and interventional applications such as adaptive RT.
Rapid MR acquisition
A primary motivation to overcome the substantial technical challenges in combining MR with a linac for MRgRT is to provide dynamic images for tumor tracking. A combination of fast MR sequences and parallel imaging techniques [
93] has been developed to provide 2D dynamic images with sub-second temporal resolution, which is acceptable for motions as rapid as respiration. For instance, the steady-state free precession sequence [
94] is suited for fast dynamic imaging because of its high SNR and robust performance in low-field MR imaging (MRI) systems. However, 2D dynamic MRI is insufficient to resolve complex anatomies, such as the pancreas, where the convoluted ones require 3D images for adequate description. 4DMRI has been developed to retrospectively sort or prospectively gate the 2D image or k-space data for an assembled 4D data set [
95–
102]. Nonetheless, 4DMRI in its current form only reflects a sparsely sampled average moving anatomy. Real-time intervention decisions, such as gating or motion tracking RT, cannot be performed on the basis of the 4DMRI alone. Existing MR techniques cannot achieve sufficiently high image quality, spatial and temporal resolution, and reconstruction speed for 3D real-time anatomical imaging. The current research in this area is focused on compressed sensing, where the k-space sample is markedly decreased to reduce the signal acquisition time [
103]. In the reconstruction step, similar to the iterative CT reconstruction, a regularized optimization problem in space and time domains is solved [
104–
107]. A major limitation of the iterative reconstruction is the long computational time that prevents it from being available as a real-time technique, regardless of the potentially achievable k-space downsampling ratio. By contrast, deep-learning-based MR reconstruction of undersampled k-space data is suited for online RT applications [
108–
112]. In these studies, similar to solving the CBCT reconstruction problem, a deep-learning neural network is trained on the basis of the paired fully sampled and undersampled MR images. The trained network then uses the undersampled k-space data as input to predict fully sampled images. In addition to superior reconstruction results of iterative reconstruction methods, deep-learning offloads most of the computational burden, including the training of the neural networks, to become offline. The online reconstruction can be performed in near real time.
AI for image synthesis, registration, and segmentation
Image synthesis
The available combination of images should be an optimal match for the need of a specific RT task. However, ideal images may not always be available. For example, an MR-only RT planning workflow is created to utilize the superior MR soft-tissue contrast, eliminate unnecessary imaging dose to the patient, and avoid the image registration problem [
113]. Without perfectly matching CT images, one challenge of this workflow is the failure of MR to provide the electron density needed for radiation dose calculation. An intuitive method to solve the problem is by segmenting the MR images into tissue subtypes and then assigning known CT densities to these tissues. The accuracy of bulk density assignment method [
114] depends on the segmentation accuracy, the complexity of anatomy, and the homogeneity of CT density within one tissue type. The need for manual segmentation inevitably increases the processing time of a patient plan. Synthetic CT (CTsynth) images based on multiparametric MR have been studied to improve the tissue mapping accuracy [
115]. Compared with MR from a single sequence, different types of tissues and structures are better quantified using multiparametric MRI. In particular, the dark cortical bones can be differentiated from air cavities in ultra-short echo time MR images. Despite their potential for accurate CT density assignment, methods based on multiparametric MR suffer from the inconsistency between images acquired at different times because of unavoidable anatomical motions and remain cumbersome in practice.
Recently, CTsynth images using deep learning have gained wide popularity [
116–
123]. Compared with conventional CTsynth methods, the deep-learning method can be fully automated and is more efficient and robust. Its versatility is further improved by the development of the cycle-consistent generative adversarial network (GAN), which allows training of the network based on unpaired MR-CT images [
124]. GAN is a new deep-learning architecture that uses two networks: the generator network attempts to generate realistic images, whereas the discriminator network attempts to distinguish between real images and those created by the generator. When successful training is completed, the generator can create an image that cannot be differentiated from the training set. A cycle-consistent GAN (CycleGAN) for image synthesis utilizes two GANs: one attempts to generate a realistic CTsynth slice given a real MR slice, and the other attempts to generate a realistic synthetic MR slice given a real CT slice. The generators are then switched and applied to synthetic outputs to translate the synthetic MR back into a CT slice, and vice versa. The original CT or MR slice should be recovered. Hence, this network architecture should show cycle consistency. The loss function for CycleGAN has an adversarial loss term for generating realistic CT images, an adversarial loss term for generating realistic MR images, and a cycle-consistent loss term to prevent the network from assigning any random realistic-looking image from the other domain. In published reports, deep-learning-generated CTsynth images are adequately accurate for dose calculation [
125–
127]. The same method can be extended to other types of image synthesis. For example, virtual 4DMR images are synthesized from 4DCT for good visualization of the liver tumor in image-guided RT [
128].
Image registration
In RT, images from different patients, times, or modalities often need to be registered to synthesize their corresponding information in a common coordinate. For example, the CT acquired at the time of positron emission tomography (PET) needs to be registered with the planning CT to overlay the PET information in CT planning for target delineation. Multimodal image registration is often needed to allow organs or tissues with better conspicuity in one image modality, e.g., MR, to help in the delineation of the target and OARs in CT planning. The corresponding CT images need to be registered to correctly accumulate the radiation dose delivered at different times. Different from rigid phantoms, voxel-level deformation occurs between the image pairs to be registered, creating a deformable image registration (DIR) problem, whose solution depends on the establishment of voxel correspondence between the image pairs. Conventionally, image-based and biomechanical methods are developed to tackle the image registration problem. In the image-based method, a deformation engine is used to iteratively morph the original image for a desirable match with the target image. Common deformation engines include “demon” [
129], freeform [
130], and B-spline [
131]. Image-based DIR has achieved remarkable success in selected applications where ample landmarks and good image contrast are available. However, DIR results in low-contrast regions and is less reliable for multimodal registration. Moreover, the registration results can be sensitive to parameter tuning, making the process subjective and tedious.
In the biomechanical method, images are first segmented into organs and assigned with known elasticity coefficients. Tissue deformation is then driven by boundary conditions derived from the origin and target images [
132]. In theory, the biomechanical method takes advantage of intrinsic tissue mechanical properties; thus, the biomechanical method is resilient to the lack of image contrast and variation in image characteristics in the multimodel registration problem. However, in practice, accurate tissue mechanical properties and boundary conditions are difficult, if not impossible, to obtain for individual patients. The actual mechanical problems are highly nonlinear, making an accurate solution for such a problem unattainable in most cases. Thus, biomechanical DIR is rarely used in RT.
Recent efforts on using deep learning have been focused on the improvement of the quality and efficiency of DIR. Notably, VoxelMorph proposed by Balakrishnan
et al. [
133] uses U-net to perform unsupervised and supervised learning for brain image registration. The method alleviates the need for manual parameter tuning and is versatile to incorporate additional manual labeling for improved registration accuracy. Although the training of a registration model can be time consuming, registration for new image pairs is faster when using VoxelMorph relative to conventional registration methods. In another unsupervised DIR study [
134], a deep convolutional inverse graphics network was used to perform DIR between CT and CBCT with superior results over those of conventional registration methods, regardless of the intrinsic CT number inaccuracy in the CBCT images. For MR-CT DIR problem, synthetic bridge images are created using the aforementioned CycleGAN to ease the challenge of matching images with markedly different characteristics [
135]. Remarkably improved DIR accuracy in comparison with the direct registration method is observed using this method to register head and neck (H&N) MR and CT images (Fig. 3).
Image segmentation
For RT, IMRT has replaced 3D conformal and 2D planning because of its superior dose conformity and OAR sparing [
136]. Inverse optimization in IMRT requires the delineation of OARs. Conventionally, IMRT has been performed by oncologists and dosimetrists. Manual segmentation is one of the time-consuming processes in RT. Furthermore, given the non-uniform training and time available for planning, manual segmentation has been shown to possess substantial intra- and inter-observer variabilities [
137]. The lengthy process required for segmentation is incompatible with adaptive RT, where a new IMRT plan needs to be rapidly created [
138,
139] on the basis of the newly acquired CBCT or better quality fan-beam simulation CTs. The advent of MRgRT has provided superior soft-tissue contrast and motivated frequent online adaptive RT [
140], where automated segmentation is important for efficient treatment planning.
A common strategy used by commercial platforms, including MRgRT systems, is the registration of manually labeled planning image to online images. Given the unavoidable non-rigid anatomical motion by the patient between image acquisitions, DIR is needed to establish a voxel-to-voxel correspondence between two medical images reflecting two different anatomical instances. The created deformable vector field would then propagate the contours to the online images. Alternatively, without the initial individual segmentation on the planning images, an atlas is generated on the basis of an average patient [
141,
142]. In practice, these methods highly depend on the accuracy of deformable registration, which can be erroneous when the deformation is large or the image contrast is low. Shape or appearance models [
143] have been used to regularize the surface formation to achieve anatomical plausibility [
144], thus preventing large contouring errors. Nevertheless, atlas methods have not found wide adoption in RT practice because of the lack of robustness and slow performance.
Deep-learning neural networks have shown potential for medical image segmentation, target detection, registration, and other tasks [
145–
152]. For RT, CNNs are used to segment H&N CT images [
153]. The resultant contours are then refined using the Markov random field algorithm. To eliminate the post-processing step, Tong
et al. developed a novel automated H&N OAR segmentation method that combines the fully convolutional residual network (FC-ResNet) with a shape-constrained (SC) model. The SC network is trained to capture the 3D OAR shape features that are used to constrain the FC-ResNet. Tong
et al. showed that superior segmentation performance of state-of-the-art methods could be achieved with dual neural nets [
154]. A challenge with training the segmentation network is that the amount of curated data with manual labeling is typically small, resulting in an overfitting problem and deteriorated independent validation results. By adding a GAN, the robustness of segmentation is improved for small training data sets [
155]. Fig. 4 and Table 2 show the H&N segmentation results. The segmentation artifacts, including organ islands and incorrect boundaries, are remarkably reduced with the inclusion of SC and GAN. Using manual segmentation as the ground truth, the volume and surface agreements are remarkably improved compared with conventional auto-segmentation methods and vanilla deep-learning neural network methods.
AI for treatment planning
Dose prediction
RT treatment planning is a labor-intensive procedure. Despite the consistent planning goals, the planning results differ because of inter-patient variability in the anatomy. The results cannot be predicted at the beginning of the planning process. In the planning process, the dosimetrist tunes a large number of optimization parameters without knowing the endpoint. Inconsistent and suboptimal plan dosimetry is common among different institutions and individual planners [
159–
161]. Knowledge-based planning (KBP) and automated planning techniques have been developed to address these challenges [
162–
164].
KBP is motivated by the observation that the achievable patient dose is highly correlated to the anatomy. For instance, the closer a critical organ is to the tumor, the higher the dose is to this organ. To learn the correlation between patient anatomies and planning dose, Wu
et al. [
165] introduced the concept of the overlap volume histogram and established its relationship with the dose–volume histogram (DVH). Zhu
et al. [
166] and later Yuan
et al. [
160] used machine-learning methods, such as support vector regression, to predict the dose. Principal component analysis (PCA) is performed on the spatial and volumetric input features to identify the most important anatomical features and avoid overfitting with limited training samples. PCA uses matrix operations to reduce the number of potentially correlated variables to a small number of uncorrelated ones. The accuracies of various dose prediction methods were previously compared [
167]. In addition to these direct regressional learning methods, ANNs are used to predict dose distributions [
162], showing similar performance for simple cases, including brain and prostate cancers. However, the prediction performance deteriorates for large regions of interest and complex cases. In relation to segmentation tasks, the 3D dose cloud is probably correlated with the underlying anatomy and should deform with the underlying anatomy of a new patient. In an atlas-based dose prediction study [
163], the CT features in the treating set were mapped to the testing set. Multiple atlases exist in the training set. This finding results in probabilistic dose estimates, where the most likely voxel dose is determined using a conditional random field, which is a method for the prediction of voxel association based on contextual information.
The common drawbacks of these conventional methods include sensitivity to parameter tuning, low accuracy in complex cases, increasing requirement on the training data set with more features included, and slow performance. This unique RT problem sets up an ideal inquiry for deep learning, which learns implicit anatomical, imaging, and dosimetric features with relatively straight forward training processes. Deep learning for the dose prediction of various clinical cases, including the H&N and prostate cancer, and different treatment modalities, including IMRT, volumetric modulated arc therapy, and helical TomoTherapy, are reported [
116,
168–
173]. Fig. 5 shows the predicted dose for an H&N cancer patient using various neural nets [
116]. The predicted dose using a hyperdense U-net showed the highest similarity to the actual dose. U-net is a type of CNN originally developed to solve the image segmentation problem. In addition to the contracting layers in CNN, the pooling operation is replaced by upsampling operators in the expansive path to the original image resolution. The symmetric contraction path and expansive path in the network form a U-shape. Hence, the name U-net is used. Paired CT and planning dose images are used to train the network, which is applied to predict the dose for a new CT.
Treatment planning can be guided by the predicted dose semi- or fully automatically. DVH constraint points can be extracted from the predicted dose and used in commercial planning systems [
174–
184], or the 3D voxel doses can be used to drive the optimization [
185,
186]. Particularly for the robotic platform, Landers
et al. showed that dose prediction can be accurate because of isotropic distribution [
187], and the combined beam orientation and fluence map optimization can be performed fully automatically [
186].
AI for patient workflow management and QA
Quantitative accuracy is important to the reproducibility and quality of RT. QA is a broad topic involving geometrical accuracy, machine output, energy, process consistency, and end-to-end treatment plan validation. Although the physical measurement of the machine output, energy, and profile will always be needed, the manual consistency check is a tedious, labor-intensive, non-bulletproof process. The RT planning and delivery include several parameters, such as prescription, tumor location, plan monitor units, beam arrangement, and dose modifiers. A single error in the process can result in devastating consequences to the patient. Therefore, automated QA of the process is desired. A straightforward approach to this problem is the design of a checklist including all relevant parameters and comparing the created treatment plan with expected values [
188]. For the checklist to be effective, the input needs to be structured in the electronic medical record system. However, this condition is not always the case. Often, pertinent diagnosis and prescription information are embedded in clinical notes written in a natural (human) language. However, the natural language is unstructured and can be difficult to search for specific treatment information. A natural language processing tool [
189] converts the natural language to be structured into computer language, where key information can be readily extracted for quality check. The checklist method may be effective in simple cases where all variables can be enumerated. However, for complex cases, such as IMRT, the possible variables exceed the checklist capacity. In such a case, for patient-specific IMRT QA, the complex plan is delivered to a phantom, and the measured dose is compared with the expected dose. Predicting and understanding the QA results are difficult tasks. Deep-learning methods have been recently applied to tackle this problem [
190–
192]. In these studies, the electronic portal images (EPIs) of the dose delivered with a given MLC configuration with or without introduced errors were used to train a vanilla CNN, which was then used to classify an invisible EPI. In these studies, deep learning can predict the QA passing rate for a patient, and CNNs were used to classify the presence or absence of introduced RT treatment delivery errors from patient-specific gamma images. Fig. 6 shows an example of using the deep-learning method for error classification; this method is superior in discriminability compared with the handcrafted approach [
191].
At the time of plan delivery, online images are first obtained for registration and patient positioning. Rigid registration is straightforward and still applicable for well-immobilized patients and anatomies showing minimal relative motion to the bony anatomy. In complicated cases where non-rigid motion is substantial, deformable registration and adaptive planning are needed. Both procedures can similarly benefit from the AI techniques as previously described. For anatomies showing substantial intrafractional motion, gated RT is performed where the treatment beam is turned on only if the tumor is within the predefined window to minimize the treatment volume without compromising tumor coverage. Evidently, online images showing the tumor location would be desirable for such purposes. AI also facilitates the acquisition and reconstruction of high-quality tumor tracking images or images with high dimensions as described in Section “AI for image acquisition.” In addition to the benefits in image quality, pre-trained AI models can be applied in near real time, making them particularly well suited for online image guidance procedures.
AI for RT outcome prediction
Although the statistical prognosis for a patient cohort exists, predicting the outcome of an individual RT patient is important. The prediction can be used to personalize the treatment for a patient to optimize the tumor local control or normal tissue toxicity [
193,
194]. The prediction can be made using imaging features [
195], genomic features [
196], or a combination of different types of features [
197]. Classical machine-learning methods, including least absolute shrinkage and selection operator (LASSO) and support vector machine (SVM), have been used to associate the imaging features with the outcomes [
191,
198–
211]. LASSO is a regression tool that “shrinks” the data toward a central point such as the mean. With sparsity regularization, LASSO encourages the use of limited parameters in a model that is more robust than the models using more parameters. SVM is another machine-learning method for regression and classification analyses. SVM attempts to find the hyperplane in the high-dimensional data space that maximally separates data belonging to distinct clusters. Although these studies showed the potential of outcome prediction and in certain cases identified the subvolumes of a tumor that can benefit from a selected radiation boost, conventional machine-learning methods rely on handcrafted features that are not robust to a different patient cohort, thus limiting their generalizability. Deep learning is well-suited to establish the correlation between the images and the outcome with improved accuracy [
169,
212–
216]. Using deep-learning architecture, the authors stratified stage III lung cancer patients into high- and low-risk groups based on their pretreatment and follow-up images [
217]. The network structure used a base ResNet CNN pre-trained with natural images. Patient CT images from individual time points were inputted into separate CNNs, whose output was then inputted into recurrent neural networks (RNN). An increasing prediction power was observed by incorporating additional follow-up images.
In addition to the prediction of tumor control, deep learning is used to predict toxicity. Zhen
et al. [
122] used pre-trained CNN to predict rectum toxicity from RT. Given that radiation dose to the rectal wall is more important than the volume dose, rectum surface meshing and deformable registration were used to map 3D planning dose to the unfolded 2D rectal wall. The model was trained and tested on 42 cervical patients and showed a 0.7 area under the curve (AUC) and 0.89 performance in 10-fold and leave-one-out cross-validation. Ibragimov
et al. [
120] used the CT images, planning dose, and patient clinical information of 125 liver stereotactic body RT patients to train a CNN for toxicity prediction. Compared with conventional machine-learning methods, CNN achieved high AUC performance. The deep-learning method can also make false predictions. Notably, many published studies reported that a relatively small patient cohort and intra-data cross-validation artificially boost the performance of machine-learning methods. Rigorous tests on independent data sets and preferably on prospectively accrued patients will be essential to demonstrate the value of AI in outcome prediction.
Discussion
This technical review heavily focused on deep learning, which is far from the entire extent of AI, because the recent rapid development of deep learning has substantially accelerated AI research and its application in RT. Numerous publications are being generated on a daily basis, whereas applications are further fueled by the open-source culture in the AI community.
The many roles of AI in RT surveyed in this review can be broadly classified into three categories. The first category includes AI-automated mature tasks that have been performed with traditional methods, including segmentation, treatment planning, and QA. In this category, AI replaces tedious, repetitive, and error-prone manual tasks with automation. AI also alleviates the burden of human operators in well-resourced clinics and enables advanced treatments in resource-limited ones. In the first category, AI adoption is rapid. Various products, including AI-assisted segmentation and treatment planning, are already in the pipeline for clinical release. The second category improves existing functions, such as image reconstruction, for low-dose CBCT, fast MRI, and image synthesis. Different from the first category, the AI in this category augments the results of traditional algorithms and tools. Non-machine-learning methods for image reconstruction and synthesis existed before the recent wave of AI but were limited to one or more fields. The third category includes the roles that provide functions that are minimally available in the clinic, such as the radiomics for individualized outcome prediction. Currently, the outcome prediction is largely at a population level based on patient clinical information and genetic data. AI prediction using individual patient image biomarkers is a new and exciting opportunity to bridge the unmet needs. However, the clinical adoption of AI for the new functions can be slow because of the following reasons.
A major criticism of AI, particularly in deep learning, is that deep neural networks are opaque. Gaining insights into the features that contribute to the results is difficult. For the tasks in the first category with clear verifiable endpoints, this criticism may not be a major issue. For instance, segmentation results can be intuitively examined and validated. Understanding the networks may not be as important as in the second and third categories, where validation is less intuitive or not readily available, particularly on the prospective patient cohort. For image reconstruction of a new patient, deep-learning type of reconstruction algorithms could introduce image distortion and artifacts that are indistinguishable from actual anatomical features. For outcome prediction, understanding the features that contribute to the outcome is important to adapt treatments accordingly. Interpretable networks [
218] and the biological basis of imaging features [
219] need to be further researched for such purpose. Another major bottleneck for AI applications in RT is the data available for training and testing. Different from tasks, such as segmentation, image synthesis, or dose prediction, where a robust network can be trained on fewer than 100 patients, which are easily obtainable in most RT clinics, outcome prediction requires data, which are difficult to obtain, from a high number of patients. In the AI research community, the time and effort to procure data are often notably greater than those for model building and training. However, without high quality and quantity of patient data and with the opacity of deep-learning neural networks, the robustness and generalizability of many models for outcome prediction cannot be further tested and improved. The best way to overcome this challenge is by contributing to the public database, where the data burden is shared by a community, and the impact of data is multiplied. The best example of public data sharing is The Cancer Genome Atlas and The Cancer Imaging Archive [
220–
222] that makes omics and medical imaging data available for researchers to test, develop, and compare their hypotheses and methods. Well-curated data have resulted in thousands of publications, in which many researchers directly used the data in their respective studies.
This review focused on machine-learning algorithms. Notably, AI cannot be narrowly viewed as a machine-learning algorithm. One aspect of AI is that the automated delivery of treatment cannot be achieved with algorithms alone. Medical robotic hardware, apart from what was mentioned in the introduction, needs to be developed for automated patient set-up and treatment delivery; however, hardware development can be significantly slow. Another important omission from the technological review is informatics and machine-learning research based on cell biology, such as the radiosensitivity related to gene expression, which can be appreciated in the review by Pavlopoulou
et al. [
223].
Apart from AI, fundamental physics, biology, and computational approaches will always be critical. AI will not replace the research on the biological effects of heavy ions and ultra-high-dose rate radiation sources or deterministic computation such as dose calculation. The efficiency gained from AI will provide time and effort for investment in basic research while simultaneously providing consistent patient care. This result is a less evident but equally important benefit of AI in RT.
Conclusions
This article reviews pertinent machine learning, AI research, and clinical applications for RT. This review follows the typical workflow of RT, including image acquisition and processing, target and OAR delineation, plan creation and delivery, and RT outcome prediction. In addition to clinical applications, representative AI methods are introduced. The article can serve as an introduction for readers who are interested in learning more about modern RT and AI research.