1 Introduction
In recent years, significant progress has been made in comprehending the molecular and cellular mechanisms underlying tumor progression. However, several significant difficulties pause further progress. First, traditional imaging techniques such as magnetic resonance imaging (MRI) [
1], computer tomography (CT) [
2], and mammography [
3], have been used for a long time as an approach to cancer screening. Obtaining clinical solutions from the data generated from these techniques is problematic as they require curation by trained professionals, which can be time-consuming. Second, cancer can be linked to changes in genes, and these can in turn be utilized as clinically relevant diagnostic [
4], prognostic [
5], and predictive biomarkers [
6]. Unfortunately, most of the potential biomarker candidates could not be translated into clinical practice due to variations in the presence of cancer metastasis, different treatment response rates of patients, and acquired resistance. Third, although new therapeutic strategies such as targeted and immune-based therapies have emerged as efficient options for combating cancer [
7], the heterogeneity of cancer causes variation in the response rate of patients to anticancer drugs.
New artificial intelligence (AI) based methods promise solutions to some of these challenges. In recent years, AI models have been extensively utilized in drug development [
8] and in cancer prediction and diagnosis [
9]. Efforts of cancer researchers have resulted in several repositories containing cancer-related data that can be analyzed and integrated with AI approaches for different applications [
10]. Among these, AI methods have become increasingly important in the evaluation of the diverse and complex data generated by next-generation sequencing. AI-based algorithms can identify genetic mutations [
11] or gene signatures [
12] that can aid in the early detection of cancer and the development of targeted cancer therapies. Handling and processing large volumes of these data requires augmentation of cloud computing and storage power, for which integrating AI systems can achieve state-of-the-art performance. AI may also directly assist oncologists at the bedside by providing estimations of clinical outcomes. Developing accurate AI models and implementing them in clinical settings remains challenging primarily because of the limitations of using heterogeneous data sets, biases in outcomes, and data privacy [
13]. Furthermore, ethical, legal, and social considerations also play a role. Regardless, AI methods have demonstrated robustness, leading to improved clinical decision-making. Overall, in the past few decades, several AI methods were proposed which were utilized for different applications in cancer research (Fig.1).
Notably, there is no such method, as “AI.” AI is a collection of methods and techniques which can be used to manipulate and interpret the original data. A major weakness of current oncology research is the sparse reporting of actual methods used, which prevents robust and reproducible research. In this review, we highlight in detail the application of different AI methods used in cancer research, including their advantages and limitations. The overall usage of the methods discussed in our review in the last ten years is provided in Tab.1. We also explore guidelines available on how AI models should be incorporated into clinical settings and how the emerging pre-trained language models can boost the personalization of cancer care strategies.
2 AI methods
The terms “Artificial Intelligence” and “Machine Learning” were coined in the 1950s [
14]. Machine learning (ML) includes two major arms, unsupervised and supervised learning. In unsupervised learning, we look for the inherent structure of the data and it includes dimension reduction (principal component analysis) and clustering. In supervised learning, we assign samples in the training set to classes and teach the model to recognize these using the input data. Supervised learning includes regression and classification—the latter involves a broad set of methods. Traditional ML models such as Bayesian networks, support vector machines, and random forest models continuously incorporate data and produce an outcome. A major set of ML methods is based on neural network algorithms that allow machines to mimic the human brain’s ability. A neural network technique with multiple layers gaining popularity in cancer research is deep learning. Deep learning uses various hidden layers, which enhance processing power to explore more complex patterns in the data. Another AI algorithm gaining prominence is natural language processing which targets narrative texts and extracts useful information that can assist in decision-making.
The AI models in cancer research have been developed to utilize commonly used inputs including multi-omics and clinical information obtained from different sources like imaging, laboratory, clinical, and pathological data. The most common task of the ML models is classification, and the general approach used to validate and assess the performance of these models is the receiver operating characteristic analysis which assists in computing area under the curve (AUC), sensitivity, specificity, and precision [
15]. Almost all ML algorithms use supervised learning for classification tasks based on conditional probabilities.
Below, we will first discuss traditional machine-learning tools including decision tree-based methods, support vector machines, Bayesian networks, and K-nearest neighbors and then we will endeavor to neural networks and large language models for natural language processing tasks. An overview of the described methods is provided in Fig.2.
2.1 Decision tree based AI methods
Decision trees are supervised learning methods used in ML and data analysis. They are tree-like models used for decision-making or predicting the classification of data sets [
16]. They are represented as structured graphs with nodes and branches indicating decisions and consequences, respectively. They learn by taking a subset of labeled training data and recursively splitting it until a decision is reached. Additionally, decision trees are recognized as one of the prominent ML algorithms as they are simple, easy to discern and quick to learn from the data [
17].
Decision trees have shown potential in prognostic decision making. A novel decision tree molecular classifier identified molecular subgroups based on presence or absence of mutation or protein in patients with endometrial cancer [
18]. They found that the patients who had polymerase-ϵ exonuclease domain mutations had the most favorable prognosis while the patients with p53 null/missense mutations had the worst prognosis.
Researchers in the past have tested and compared the accuracy of conventional methods like MRI, positron emission tomography-CT, and sentinel node biopsy in detecting lymph node status, an important characteristic for selecting appropriate treatment for patients [
19]. But, with improvements in image analysis using radiomics [
20], a model of clinical factors combined with a decision tree model achieved the best diagnostic performance with an AUC of 0.84 in the validation cohort for predicting lymph node metastasis in patients with cervical cancer [
21]. Radiomics generates massive amounts of high-dimensional data and ML models can reduce this dimensionality and identify relevant features. So, researchers have linked radiomics with genomic features of tumors to predict mutations in lung cancer [
22] and copy number variations in glioblastoma [
23]. These show the potential to develop image-based biomarkers that can improve diagnostic accuracy and treatment selection. Besides the radiomics model, when a decision tree model was combined with multiparametric MRI features, the model demonstrated low performance in predicting pathological complete response, disease-specific survival, and recurrence-free survival in patients with breast cancer [
24].
Efforts have been made by researchers to improve the accuracy and interpretation of diagnostic models. For example, a stacking-based decision tree ensemble learning method was proposed for detecting prostate cancer [
25]. The learning process of this method involved base-level learning using regression trees, model selection, and stacking, as well as extracting decision rules from regression trees. However, the authors observed longer training time compared to single classifiers and other ensemble methods. In another study the performance of an ensemble method called Extra Trees was evaluated [
26]. The model achieved high diagnostic accuracy in classifying breast cancer types from the Wisconsin Breast Cancer Database. However, it was reported that Extra Trees are black box models and may be inefficient when using a large number of trees. Moreover, decision trees may also overfit the data, decreasing the performance of the algorithm [
27].
Despite these limitations, decision tree based learning methods, such as random forest and gradient boosting, have been developed to further enhance predictive accuracy.
2.1.1 Random forest
Growing ensemble of trees has gained significant attention in cancer research due to its improvements in classification accuracy. The random forest algorithm uses several individual decision trees and ensemble learning techniques to produce a single output [
28]. The random forest classifier repeatedly selects a random subset of features to train and generate many decision trees. Finally, the class selected by most trees is considered as the output.
Random forest models have been employed for the analysis of cell line data to predict drug resistance [
29] and for early cancer detection in patients using blood-based assays. For instance, a multi-analyte blood-based test called CancerSEEK uses assessments of genetic alterations and abundance of protein biomarkers to identify early malignant lesions [
30]. Identifying stratification factors that define patient characteristics enabling early detection is crucial. In this regard, ML models could be valuable in recognizing complex patterns of different biomarkers to improve diagnostic accuracy. For instance, a random forest model utilizing DNA methylation biomarkers achieved an AUC of 0.95 for discriminating between patients with less or more aggressive prostate cancer and proved to be an independent predictor of recurrence free survival [
31]. A different study focused on multiple peripheral blood biomarkers and patients’ clinicopathological features developed a random forest model that achieved an AUC of 0.96 for distinguishing epithelial ovarian cancer from benign ovarian tumors, surpassing other ML models [
32].
Extending to prognostic studies, random survival forest models were developed to predict overall survival in colorectal cancer [
33] and cancer specific survival in pancreatic cancer [
34]. A random forest model based on The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus data sets of ovarian cancer patients, identified 17 metabolic pathways associated with prognosis [
35]. Also, a random forest model integrating radiomics features demonstrated potential in predicting prognostic factors associated with breast cancer [
36]. Further focus has also been given to treatment outcome prediction. A random forest classifier was trained in rectal cancer patients to discriminate their treatment outcomes and reached an AUC of 93% [
37]. Another random forest classifier outperformed several machine algorithms in discriminating (chemo)radiotherapy outcome in a multi-cancer patient cohort [
38].
Improved versions of random forest models have also been explored. A tumor feature selection strategy using a random forest based on a genetic algorithm showed good performance in predicting the clinical outcomes of patients with esophageal cancer [
39]. The model demonstrated high classification accuracy for prediction and prognosis tasks, with AUC values of 0.82 and 0.80, respectively. For the diagnosis of breast cancer, an interpretable random forest model based on rule extraction methods was proposed for making better patient-centric decisions [
40]. The performance metrics demonstrated that the proposed method outperforms other black box models and rule extraction methods in terms of accuracy.
In the past, a quantitative structure-activity relationship-based random forest prediction model has been developed to assess the structure-activity relationship of a large set of compounds for discriminating the epidermal growth factor (EGFR) inhibitors which is an important drug target in cancer [
41]. In more recent studies, random forest models focused on synergistic drug combinations for accelerating drug discovery processes. In this regard, drug target, gene expression profiles [
42], and large-scale phenotypic drug combinations data sets [
43] were useful for building random forest models to predict synergistic drug combinations.
Based on these applications, it can be noted that random forest models have been widely employed from a prediction perspective, can show satisfactory performances, and achieve high accuracy and AUC for classification tasks. They are best suited for the analysis of many variables in relatively medium-sized data sets and could also address the challenge of limited sample size [
44]. Random forest models are alternatives to conventional statistical methods that are not suitable to extract information with multiple input variables and to identify factors identifying cancer patients. However, their applicability in the early detection of cancer may still be limited because of the challenges associated with interpreting the model for making clinical decisions [
45].
2.1.2 Gradient boosting
The boosting methods are based on a constructive strategy adding new features to the ensemble formation sequentially [
46]. Among them, the gradient boosting algorithm is a novel and complex cooperative learning method that produces a strong prediction model or decision tree by using an ensemble of weak models [
47]. One of the commonly used gradient boosting frameworks is Extreme Gradient Boosting (XGBoost), which uses decision trees as weak learners and outputs the sum of the predictions of all the individual trees [
48]. The XGBoost algorithm also employs random seeds to improve its performance in repetitive tasks.
In oncology, the Cox proportional hazards model has been the standard prognostic model for survival outcomes but it does not capture nonlinearities of features and can be inadequate when using high dimensional data [
49]. ML models like gradient boosting have demonstrated significant potential in predicting survival outcomes among cancer patients. For example, XGBoost successfully predicted five-year survival in non-metastatic breast cancer patient data obtained from the Netherlands Cancer Registry and showed comparable performance with the classical Cox proportional hazards regression model [
50]. In addition, it was noted that unlike the Cox model, which did not assume any relationship between the features, XGBoost model effectively captured complex interaction between the features and modeled the nonlinearities. Additionally, studies on breast cancer [
24] and clear cell renal cell carcinoma [
51] patients found that the XGBoost models demonstrated better performance and higher accuracy compared to other ML methods when predicting survival outcomes.
The XGBoost algorithm can be a cost-effective solution for finding clinically relevant biomarkers for targeted therapies by detecting key mutations in clinical images. Clinically relevant biomarkers such as
EGFR and
KRAS are frequently mutated or altered in cancer, especially in patients with non-small-cell lung cancer [
52]. In a recent study, an XGBoost model using radiomics features showed robust performance in detecting
EGFR and
KRAS mutations with 83% and 86% accuracy respectively from the Cancer Imaging Archive data of non-small-cell lung cancer patients [
53]. Additionally, for boosting the survival chances in breast cancer, prediction of metastasis and recurrence were explored. For instance, one study showed that the XGBoost model identified 6 gene signatures among which
SQSTM1 was found to regulate metastasis in breast cancer [
54] while another study showed that color features play an important role in detecting breast cancer metastasis and recurrence using XGBoost [
55].
New developments in the treatment of cancer have led to improved clinical outcomes where publicly available data from the TCGA cohort has been used to predict treatment response of breast cancer patients to paclitaxel [
56] and immune checkpoint inhibitors [
57] using an XGBoost model. Furthermore, XGBoost models could predict chemoradiation response for patients with esophageal cancer [
58] as well as patient-reported outcomes at the 1-year follow-up of surgery for patients with breast cancer [
59]. Another XGBoost model showed significantly better performance in distinguishing between cancer types such as early and late stages of renal clear cell carcinoma, renal papillary cell carcinoma, lung squamous cell carcinoma, and head and neck squamous cell carcinoma with DNA methylation data [
60]. Hence, XGBoost models could be implemented for patient-centric decisions and promote targeted and effective treatment.
While gradient boosting algorithms such as XGBoost are significantly advancing cancer research, their main advantage lies when handling data with missing or incomplete values primarily without imputation [
58]. Its ability to classify and utilize the data without imputation makes it a valuable tool for researchers. Moreover, how the gradient boosting model makes predictions is easy to decipher, as it has decision trees as the base learner [
46].
2.2 Other supervised learning AI methods
2.2.1 Support vector machines
The concept of support vector machine (SVM) is different from decision tree-based methods. SVM is a supervised ML method that uses a hyperplane or decision boundaries for classification problems [
61]. The algorithm uses kernel functions that make the computation faster; therefore, the choice of kernel functions influences the performance of the model [
62]. In addition, they are suitable for nonlinear classifications as well.
SVM-based classifiers have been widely used in cancer research since the advent of high-throughput microarray gene expression. A very early application of SVMs in this area focused on the classification of cancerous ovarian tissue, normal ovarian tissue, and normal non-ovarian tissues [
63]. In the later years, SVM applications expanded to include gene expression and copy number variation features for predicting breast cancer patients’ response to chemotherapeutic agents like paclitaxel and gemcitabine using an online platform [
64].
More recent applications of SVM include integrating radiomics features into SVM models. Predictive or prognostic modeling in radiomics can be useful to improve decision support in oncology. For example, a study used kernel SVM classifier with MRI radiomics features to predict local and distant failure in patients with advanced nasopharyngeal carcinoma which could be useful for making decisions regarding treatment plans [
65]. Another study constructed a radiomics signature consisting of 30 selected features using linear kernel SVM to distinguish whether patients with rectal cancer received a pathological complete response to neoadjuvant chemotherapy [
66]. Additionally, an interesting study on colorectal cancer revealed that CT radiomics signature is highly correlated with
KRAS/NRAS/BRAF mutation status [
67]. These applications suggest that SVM holds high predictive or prognostic potential which could enhance the applications of non-invasive and cost-effective techniques like radiomics.
Identifying biomarkers is a crucial approach in cancer diagnosis. These biomarkers can serve as features for ML models to classify healthy and diseased samples [
68]. For instance, an SVM classifier using integrated extracellular vesicle long RNA markers demonstrated high sensitivity and specificity in classifying hepatocellular carcinoma patients and healthy controls [
69], while DNA methylation-based biomarkers were associated with recurrence in early-stage hepatocellular carcinoma [
70]. In gastric cancer research, SVM models have been utilized to predict survival of patients. By incorporating immunomarkers and clinicopathologic features, a prognostic SVM classifier was developed to predict overall survival and disease-free survival in gastric cancer patients and identify the benefits of postoperative adjuvant chemotherapy in stage II and stage III patients [
71]. Additionally, an SVM model based on 32-gene signature specific to gastric cancer generated risk scores that were prognostic of overall survival and response to treatments [
72]. The stability of biomarker selection while developing these models is necessary for reproducibility of the classification so that the prediction model shows similar performance when classifying new samples.
Another important factor in diagnostic applications involves cancer staging and grading systems. For example, the Fuhrman nuclear grading system is used to assess the tumor aggressiveness in renal cell carcinoma, impacting clinical treatment selection [
73]. Likewise, for cervical cancer, clinical staging is recommended by the International Federation of Gynecology and Obstetrics based on imaging and pathological findings [
74]. In addition to the staging and grading systems, ML methods could play a vital role in rendering clinical decisions. An SVM classifier has demonstrated promising results in predicting high and low Fuhrman nuclear grades from CT texture features in clear cell renal cell carcinoma [
75]. The model’s performance was comparable to percutaneous biopsy—an invasive method available for Fuhrman nuclear grading. For prognosis prediction, high-risk surgical-pathological factors, including the International Federation of Gynecology and Obstetrics staging, were used to investigate the accuracy of the SVM model in early-stage cervical cancer patients after surgery [
76], showing that these factors could predict the recurrence with an accuracy of 69%. Another study utilized gene signatures to distinguish colon cancer patients at high risk of recurrence from those at low risk [
77].
It is worth noting from these applications that SVM models show versatile performance when using a wide variety of biological data, such as multi-omics, imaging, and clinical data, for cancer diagnosis and prediction. SVM models have also paved their way toward drug discovery by outperforming other ML methods in predicting the inhibition of breast cancer resistance protein [
78]. Nonetheless, challenges still lie in making a good SVM classification model. Therefore, to improve the classification accuracy, several researchers proposed different approaches. For example, the feature clustering recursive feature elimination [
79] could be a suitable approach to reduce computational complexity, redundancy among genes, and increase classification accuracy. On the other hand, a weighted AUC ensemble learning based on SVM [
80] could significantly increase accuracy of breast cancer diagnosis. Finally, the performance of SVM relies heavily on the choice of kernel functions, and the expertise of the user.
2.2.2 Bayesian network
Another useful ML model for classification is the Bayesian network classifiers which produce probabilistic estimation of the variables [
81]. They are represented as a directed acyclic graph that explains the relationships or dependencies amid the random variables based on inferences. Among the Bayesian networks, the Naïve Bayes classifier is the most effective [
82].
Naïve Bayes has been successful in lung cancer [
83] and rectal cancer [
84] prognosis which involved prediction of patient survival. Additionally, it was also used for the diagnosis of diffuse large B cell lymphoma genetic subtypes based on mutation, copy number variation, and BCL2 or BCL6 rearrangement data, providing the likelihood that a patient’s lymphoma belongs to one of the six defined genetic subtypes [
85].
Hospitals generate a vast amount of data, and conducting research on these data can be challenging due to ethical, legal, and administrative issues. To address this issue, Jochems
et al. used a distributed learning approach to train a Bayesian Network model on clinical data of patients with lung cancer treated with chemoradiation or radiotherapy at five hospitals [
86]. The model underwent external validation with hospital data not included in the training set and achieved an AUC ranging from 0.59 to 0.71. Targeted therapy, especially for breast cancer treatment, is a crucial focus in oncology. The ability to predict pathological complete response to neoadjuvant chemotherapy is a significant advancement for improving patient outcomes. The development of Naïve Bayes model based on radiomic features represented a sophisticated approach for predicting pathological complete response to neoadjuvant therapy [
87]. The model demonstrated high performance and achieved an AUC of 0.93 for triple-negative and in human epidermal growth factor (HER2) positive patients. Similarly, another Naïve Bayes prediction model exhibited significant positive correlation with pathological complete response to neoadjuvant chemotherapy [
88]. Hence, Bayesian Network models could enable targeted administration of neoadjuvant therapy and prevent delay of the clinically effective treatment for breast cancer patients.
Furthermore, to evaluate the performance, studies have compared Bayesian network models with other ML models. In one study, the accuracy of a Naïve Bayes classifier was assessed against other ML classifiers for classifying benign and malignant breast tumors [
89]. In another study, a Bernoulli Naïve Bayes algorithm was compared with traditional ML methods for predicting the binding of estrogen receptors [
90]. However, in these studies, Naïve Bayes models did not show satisfactory performance in terms of accuracy and AUC compared to other ML methods.
A small body of literature suggests that modifying Naïve Bayes classifiers could improve classification accuracy for breast cancer detection. For instance, a weighted Naïve Bayes classifier, proposed for the detection of breast cancer, achieved an accuracy of 98.5% when trained and tested on attributes from the Wisconsin Breast Cancer Database [
91]. In another study, authors proposed a two-layer ensemble hybrid classifier for detecting malignant and benign tumors, indicating potential to enhance classification accuracy of traditional Naïve Bayes classifier in breast cancer detection [
92]. These studies are limited to breast cancer data but could be expanded to other cancer types.
Naïve Bayes classifiers assume conditional independence of features without considering their relationship which can lead to poor performance with subjective observations [
82]. In this regard, developing a Naïve Bayes classifier by selecting a subset of attributes may improve accuracy [
93,
94].
2.2.3 K-nearest neighbors
One of the simplest yet popular ML algorithms, the K-nearest neighbors (kNN), relies on the distance and assumes that a similar ‘k’ number of data points are close to each other [
95]. It selects the class with the highest probability based on the likelihood of the test data belonging to the ‘k’ training data. This classification method does not require any prior knowledge about the distribution of the data [
96].
Several notable studies of kNN have used imaging data from mammograms [
97] and breast ultrasound image segments [
98] for breast cancer classification. Based on the texture features of the lesions obtained from MRI, a kNN model classified breast cancer subtype images with a ROC AUC value of 0.81 [
99]. In a similar study [
100], radiomics features were extracted from contrast-enhanced MRI for classifying breast cancer receptor status and molecular subtypes. In a study by García-Laencina
et al., four ML methods including kNN were used to predict five-year survival of breast cancer patients with incomplete clinical data and accuracy of more than 81% and an AUC of more than 0.78 without any imputation [
101].
Due to the utilization of a random seed at the kNN commencement, repeated runs can produce different outcomes. This is probably the major limitation and could be the reason why kNN models demonstrated low accuracy compared to other ML techniques in applications on different cancer types, such as lung cancer [
102] and brain tumors [
103]. To address this issue and improve performance, researchers combined a wrapper-based feature selection method with a kNN classifier which could be suitable for microarray or RNA-Seq data that has thousands of features [
104]. However, common drawbacks of wrapper-based methods for feature selection are that they are prone to overfitting and can be computationally intensive [
105]. To address this, other researchers [
106] followed a different approach by proposing a combination of particle swarm optimization methods along with adaptive kNN for gene selection from microarray data. A study by Zhang
et al. also proposed three methods for finding optimal ‘k’ values for efficient classification of test or new data [
107]. Furthermore, as building classical kNN models could be time-consuming when dealing with large data sets, some fast versions of the kNN algorithm [
108] have been developed by researchers for disease prediction.
2.3 Neural network algorithms
A subset of ML, called the neural networks, have been a great deal of excitement among the scientific audience. The concept of artificial neural network (ANN) emerged from the way neurons in the brain work [
109]. They use hidden layers connecting each node or artificial neurons to generate output from the input variables. These hidden layers are connected in a hierarchical manner like the organization of neurons in the brain. The strength of the neural connections of an ANN depends on an optimization technique called back-propagation. ANN models may overfit the data and show poor generalization capability when several neurons are allowed in the hidden layers.
In the early 2000s, ANNs found widespread use in diagnostics [
110,
111] and prognostic outcome prediction [
112,
113] applications. In the following years, evolved neural network approaches like particle swarm-optimized wavelet neural networks [
114] and genetically optimized neural networks [
115] were proposed for the detection and diagnosis of breast cancer. ANNs have also been employed with one or two hidden layers for diagnosis and prediction of breast cancer [
116] and pancreatic cancer [
117]. Over the years, many important cancer-related applications which attracted attention have been based on the concept of deep neural networks which is discussed in the next section.
2.3.1 Deep learning
Neural networks consisting of more than one hidden layer are termed “deep.” The fundamental architecture of deep learning (DL) is based on deep neural networks consisting of multilayered interconnected nodes or artificial neurons for categorization [
118]. This neural network architecture uses nonlinear functions like the rectified linear unit (ReLU), to pass the result of the weighted sum of inputs from the previous layer to the next layer.
2.3.2 Convolutional neural network
Convolutional neural network (CNN) [
119], a type of deep feedforward neural network that has become dominant in various computer vision tasks, is attracting interest across a variety of domains, including radiology. CNN is designed to learn spatial hierarchies of features automatically and adaptively through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. A review article by Yamashita
et al. offers a perspective on the basic concepts of CNN and its application to various radiological tasks [
120]. The review also highlights that the two challenges in applying CNN to radiological tasks are small data sets and overfitting which can be overcome by training more data. Being familiar with the concepts and advantages, as well as limitations of CNN, is essential to leverage its potential in diagnostic radiology, to augment the performance of radiologists and improve patient care.
Deep CNN is capable of showing excellent performance in supervised learning for image classifications [
121]. Recent studies have shown that DL models have revolutionized the analysis of medical images in oncology by learning representative features from raw input like tumor tissue images. A study reported that a DL-based classifier could extract more prognostic information, like survival, from the tumor tissue of colorectal cancer than experienced pathologists [
122]. CNN models based on hematoxylin and eosin-stained tumor sections of colorectal cancer patients were used for survival prediction outcome [
123,
124] and identification of molecular subtypes associated with prognosis [
125]. Routine histological images, when analyzed with DL models, can provide useful information for directly identifying genetic mutations, tumor molecular subtypes, gene expression signatures, and pathological biomarkers of potential clinical relevance [
126]. Furthermore, DL model based on histological images could reduce the time-consuming diagnostic process of predicting recurrence and metastasis in HER2-positive breast cancer patients [
127]. Besides histological images, CNN based on time series CT images were used to address the challenge of capturing the evolving phenotype of tumors and predicted several clinical endpoints from patients with locally advanced non-small cell lung cancer to improve clinical outcomes [
128].
Computer aided detection and diagnosis systems have been used for a long time to help radiologists to analyze mammogram screenings [
129]. However, since the inception of AI technologies, DL models could be used to reduce human bias by learning directly from training data. CNN outperformed the conventional computer aided detection and diagnosis system in detecting solid and malignant lesions and showed best AUC of 0.90 on the validation set [
130]. A computer aided diagnosis system based on Faster R-CNN [
131] was used to detect and classify malignant or benign lesions on mammogram. The system showed the highest AUC of 0.95 on the INbreast [
132] data sets. Using similar INbreast data sets, another CNN model achieved AUC of 0.95 per image [
133].
Furthermore, CNN can be a powerful algorithm that may surpass human experts when applied to classification tasks. For instance, a CNN trained on histopathological images of melanoma and nevi showed good concordance with the pathologists [
134]. In lung cancer application, a CNN model outperformed six radiologists in predicting risk of lung cancer from CT imaging data [
135]. A DL model even surpassed radiologists in identifying breast cancer from cancer screening program mammograms in the USA and the UK and demonstrated improvement in absolute specificity (1.2%–5.7%) and absolute sensitivity (2.7%–9.4%) [
136]. However, these studies aimed to illustrate the potential of DL models rather than replace human experts. Instead, using a blend of DL methods and radiologists could facilitate better interpretation in mammogram screening [
137,
138].
In cancer subtype classification, renal cell carcinoma subtypes were distinguished using a CNN from histopathological images [
139]. The authors showed that training CNN in whole-slide images of renal cell carcinoma achieved very high accuracy in distinguishing tumors from normal tissue. Another CNN model discriminated four molecular subtypes: canonical luminal, immunogenic, proliferative, and receptor tyrosine kinase-driven from hormone receptor positive/HER2 negative breast cancer patients, based on which precise treatments were proposed [
140]. A more recent CNN based study followed an integrative approach by using gene expression and methylation data of glioma patients to classify subtypes into low grade gliomas and glioblastoma multiforme [
141]. These developments have the potential to improve treatment selection by enabling more tailored and effective approaches based on the specific molecular subtypes identified.
Focusing on predicting gene mutations, a CNN model identified several significant mutated genes related to prognosis from histopathology images of hepatocellular carcinoma with AUC values between 0.71 and 0.89 [
142]. The study also reported that the performance of DL classifiers was nearly comparable to that of a 5-year experienced pathologist for tumor classification and differentiation. In another analysis, mutations of six key genes, including
STK11, EGFR, FAT1, SETBP1, KRAS, and
TP53, were predicted from pathology images of lung adenocarcinoma, with AUCs ranging from 0.73 to 0.86 [
143]. These findings support the idea that DL models will effectively assist pathologists in the detection of cancer mutations.
One of the major challenges in cancer treatment is investigating the effect of potential therapeutic agents. Several studies show that DL is an important approach to consider for predicting drug responses in cancer patients. For example, a multi-omics late integration method based on deep neural networks was developed to predict drug response using somatic mutation, copy number aberration, and gene expression data as input [
144]. In another study, a deep neural network was trained on gene expression and drug response data of cancer cell lines from the Genomics of Drug Sensitivity in Cancer database to predict drug responses [
145]. The model was tested on multiple unseen clinical cohorts where it outperformed other ML algorithms. Recognizing the challenge of interpreting these models, Kuenzi
et al. developed a better DL model that is interpretable and could be used in clinical settings for predicting drug response and identifying synergistic drug combinations [
146]. In the process of drug discovery, identifying a large number of drug combinations from pharmacogenomics databases can be time-consuming and investigating medium or large scales of these data in real-life settings can be challenging. Focusing on these challenges, an improved DL model, Deep-Resp-Forest, based on deep forest architecture, was developed, which could adapt to different scales of data by automatically learning the depth of the forest cascade [
147]. Also, the development of publicly available online platforms like DeepSynergy, based on a feed-forward neural network model, has shown advantages in prioritizing and screening anti-cancer drug combination data sets [
148].
Recent developments in DL have unlocked valuable insights from electronic health records, addressing the challenge of analyzing these messy data. The incorporation of free-text clinical records from electronic health records has had an outstanding impact on the performance of state-of-the-art DL models in predicting clinical problems and outcomes [
149]. In cancer research, DL models have incorporated electronic health records in tasks like predicting the risk of breast cancer [
150] and the onset of pancreatic cancer [
151]. DeepPatient, a novel unsupervised deep learning method showed great potential in creating a general-purpose set of patient features from raw electronic health records of various cancer patients that may be used for building predictive clinical models [
152]. However, despite these retrospective studies, extensive experiments and prospective trials may be required to demonstrate the accurate predictive ability of these models.
Based on the studies described above, DL models can be highly efficient in analyzing cancer imaging data and have shown robust performance in prediction, detection, and classification tasks initiating a surge of interest among pathologists and radiologists. The success behind DL models showing excellent performance on imaging data is their sensitivity to minuscule details and intricate structures using the backpropagation algorithm [
118]. Hence, these models can be useful in extracting a large number of features missed by humans to discover underlying disease characteristics and patterns in cancer patients. While the success of DL models holds promise to overcome the challenge of translating drug response research to actual patients, it is important to note that because of the their complexity they function as black box models, making it difficult to interpret what features are used for learning and how decisions are made [
153]. Moreover, a lack of retraining on large patient cohorts may hinder the performance of DL diagnostic models [
126]. Therefore, an interpretable and retrained DL model may show better performance in clinical settings.
2.3.3 Natural language processing
In recent years, extensive research has been conducted on algorithms, such as natural language processing (NLP), which can process human text, speech, or language using computer algorithms [
154]. The performance of the NLP system depends on the task and shows better results on the data sets on which they are built [
155]. AI techniques like neural networks, SVM, and decision tree-based methods can also be applied to NLP tasks.
In the field of cancer research, manual analysis of clinical data are time-consuming and inaccurate. The need to decrease manual abstraction of information from clinical charts and reports has shifted the focus of researchers to automatic extraction. In this context, early NLP models were proposed to serve as an alternative and efficient strategy for extracting structured and unstructured clinical texts related to breast cancer patients. For instance, an automated NLP system was developed for the extraction of breast imaging reporting and data system categories from breast radiology reports [
156] and to automatically extract unstructured text from mammography and pathology reports [
157]. Furthermore, an NLP system identified 92% of breast cancer recurrences from electronic health records with high sensitivity and specificity [
158]. Another study focusing on drug repurposing used NLP to extract drug exposure information from unstructured clinical data of cancer patients [
159]. DeepPhe software, based on an NLP model, was developed to perform automatic and detailed extraction of phenotypes from the electronic health records of cancer patients [
160]. This software produces a summary of the characteristics of cancer-related phenotypes, which is useful for further clinical investigations.
From these applications, it is evident that NLP works best with data that contain structured or unstructured texts, such as electronic health records or clinical notes. They facilitate rapid analysis of unstructured data and reduce human error in clinical settings. In addition, NLP models can support the development of oncology databases which still require manual annotation of free-text clinical data [
161]. However, in real-world clinical reports, having uncertainty in parameters like lymph node status, NLP models might display low performance [
162]. Therefore, to create a robust model, careful planning and multiple iterations are required, which could be invaluable for extracting important information from medical records.
2.3.4 Large language models for NLP task
In recent years, large language models that are based on transformers [
163], have been explored. These models are AI systems that use neural network architectures to generate content by training on massive data sets consisting of words [
164]. Large pre-trained language models have shown benefits for NLP tasks [
165]. Hence, to improve NLP tasks, models such as the Generative Pre-trained Transformer (GPT) [
166] have been explored. There has been a surge in GPT models, which has attracted widespread interest, especially since OpenAI launched the ChatGPT chatbot in 2022.
In 2023, several researchers have explored the performance of ChatGPT in cancer-specific queries. A ChatGPT model was evaluated based on its responses to questions on hepatocellular carcinoma [
167]. The model showed comprehensive and correct responses about basic knowledge, treatment lifestyle, and diagnosis. However, the model incorrectly answered questions related to hepatocellular carcinoma screening. In a different query, ChatGPT responded to questions related to the diagnosis, prognosis, and treatment of lung cancer and pancreatic cancer with a similar quality as Google’s feature snippet [
168]. For breast tumor management, ChatGPT was evaluated as a clinical decision support tool for 10 patients [
169]. Surprisingly, ChatGPT’s recommendations for surgery were similar to the tumor board’s decision in 7 out of 10 cases.
Although ChatGPT has shown remarkable capabilities, its limitations in the field of cancer have been proven in several studies. The accuracy of ChatGPT on cancer myths and misconceptions was compared with that of the National Cancer Institute information [
170]. The overall accuracy of 13 questions was 100% for National Cancer Institute and 96.9% for GPT. Another interesting study shows that ChatGPT provided incorrect treatment recommendations along with correct ones for breast, prostate, and lung cancer [
171]. Hence, the accuracy of cancer-related information is still not reliable. Similar study on prostate cancer showed that the accuracy and precision of ChatGPT content was low and the information was not always consistent when compared to a reference source [
172]. Hence, ChatGPT does not provide any references, generates multiple answers, and shows incorrect references, which remain major limitations. Therefore, caution must be exercised by clinicians when dealing with ChatGPT’s responses or while introducing them into the clinical settings.
The more recent GPT-4 [
173], which can process both texts and images as input, has shown some potential. While comparing ChatGPT with the GPT-4 for lung cancer applications, GPT-4 performed better in translating chest CT reports into plain language [
174] and extracting phenotypes from free-text CT reports [
175]. Hence, GPT-4 can be a promising tool in radiology and may also provide another possible avenue for decision-making and treatment recommendations based on cancer imaging data as well.
3 AI guidelines
In recent years, there has been growing concern about the risks associated with the use of AI. To ensure unbiased and better formulation and delivery of AI-based studies in medical research, experts have developed or suggested several reporting guidelines (Tab.2). These guidelines are based on recommendations from the EQUATOR (Enhancing the Quality and Transparency of Health Research) network, which promotes and develops guidelines to improve the quality of healthcare. The first reporting guidelines, such as the SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) and CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) were developed for AI interventions in clinical trials [
176]. SPIRIT-AI guidelines are centered on the use of AI in clinical trial protocols [
177], whereas CONSORT-AI guidelines are centered on the use of AI in clinical trial reports [
178]. Both guidelines were developed simultaneously and show similarities in terms of reports and protocols. In addition to these guidelines, establishing how the models should be developed and tested is also important for transparency. The MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling) checklist was suggested for reporting AI algorithms in the field of medicine [
179]. MI-CLAIM also overlaps with MINIMAR (Minimum Information for Medical AI Reporting) [
180], which focuses on guidelines for developing AI algorithms.
Furthermore, to report research that uses AI for diagnostic test accuracy, the STARD-AI (Standards for Reporting of Diagnostic Accuracy Studies-AI) was published [
181], addressing the limitations of the previous STARD 2015 [
185] for utilizing AI models. Similarly, TRIPOD-AI (Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis-AI) and PROBAST-AI (Prediction model Risk Of Bias Assessment Tool-AI) were published for improving the diagnostic, prognostic, and prediction of ML models [
182]. A risk bias tool called QUADAS-AI (Quality Assessment of Diagnostic Accuracy Studies-AI) was developed to evaluate the risk of bias and applicability in AI-centered diagnostic accuracy studies [
183]. The guidelines are not limited to building models but have also been extended to decision-making systems. AI technologies have been developed to assist decision-making in healthcare, but only a few have been successful and benefited patient care. The DECIDE-AI (Developmental and Exploratory Clinical Investigations of Decision Support Systems driven by Artificial Intelligence) reporting guideline based on a checklist that includes 17 AI-specific and ten generic reporting items was developed to improve clinical decision outcomes during the early stages [
184].
The major focus of these reporting guidelines is on building an AI model with complete transparency on its algorithms, architecture, accuracy, performance, and study design. They were established to serve as standard documenting for developers, investigators, data scientists, and clinicians. Apart from the reporting guidelines, several researchers proposed how AI prediction models should be implemented in healthcare settings. Smith
et al. proposed a “plan, do, study, adjust” approach while deploying AI for patient care [
186]. Wiens
et al. proposed testing the AI system in real-time and checking it with country-specific government regulations before deploying the model to the market [
187]. Larson
et al. proposed strategies that can address the shortcomings of developing and evaluating diagnostic AI algorithms before implementing them in healthcare [
188]. They also suggested continuous monitoring and evaluation of the AI system throughout its life cycle. Despite these recommendations, more transparency and guidance are required in terms of software scrutiny, cost-effectiveness, retraining of the data sets, and a checklist of conditions required to make use of an AI system for a particular task. To date, there is no specific guideline available for the use of AI for the study design in the cancer domain.
4 Future directions
In the rapidly evolving world of technologies, AI holds great potential in cancer research. There is a growing need for AI technologies that not only provides good accuracy but that can be trustworthy and understandable. In this regard, explainable AI [
189] has been an emerging trend. Explainable AI helps to understand how a model functions and interpret the predictions generated by the model, accounting for the limitations of the black box AI systems. This ensures transparency and leaves space of improvements. In addition, the experts would have the means to understand the whole decision making process of AI which would ultimately prevent barriers to implement AI into clinical settings.
In the long-term, ML algorithms will have a major role in personalized and targeted therapies. With data explosion, soon every facet of multi-omics data such as transcriptomics, genomics, epigenomics, metabolomics, and proteomics for individual patients would be stored in databases which could be used for therapy selection. Early adoption of matching drugs to patients based on the multi-omics features will improve the personalized medicine paradigm. Open source AI platforms like MatchMiner [
190] have the capability to assist clinicians to match candidates to precision medicine trials based on their genomic profiles. Further, data-driven AI tools will be useful for accelerating clinical trials through linking individual patients to trials which would overcome the current challenge of labor intensive work.
While there has been tremendous amount of research in common cancer types in tissues, like breast and lung, focusing on obtaining data from rare cancer types or tissues will be a major necessity for future research. Recently, a group of researchers developed large language based prediction model, CancerGPT [
191] which successfully predicted drug pair synergy in rare cancer tissues with limited data. Their model could help researchers to promptly identify potential targets and biomarkers.
Non-invasive AI tools with high accuracy represent the future for early cancer detection and diagnosis. For instance, DermaSensor [
192], an FDA approved device, uses an AI algorithm to analyze spectral data of skin lesions for the detection of skin cancer. However, the development and commercialization of these software and devices will take a long time due to regulatory limitations and clinical lengthy clinical trials. Regardless, with continued research efforts and innovation, AI technologies will revolutionize cancer detection and improve patient outcomes.
5 Conclusions
Building AI prediction models has been a crucial area of focus in cancer research. In this review, we discussed and summarized how different AI methods have shown remarkable progress in cancer-related applications. The traditional ML methods, particularly, the supervised learning algorithms, have outperformed conventional statistical tests in classification tasks. These traditional ML algorithms have been used widely with multi-omics and clinical data for cancer classification and for diagnosis of cancer, predicting survival of patients, and treatment response.
Moreover, DL models have opened new possibilities for better accuracy than the traditional ML models in prediction tasks. They constitute a more recent approach and have been widely used in several cancer-related applications, specifically with imaging data. The ability of CNN models to provide clinician level interpretation has shown the potential of DL in oncology. Furthermore, recent studies comparing validation metrics of AI methods for feature selection and classification have shown promising results.
Focusing on an emerging AI technology, we also highlighted that pre-trained language models (GPT) could provide useful solutions when prompted with cancer-related queries. These large language models have the power to extract and analyze crucial insights from massive data sets and may have extensive utility in cancer research by extracting data to look for correlations between patients, identifying drug candidates, and assisting in personalized treatment options. Hence, their rapid advancements show that virtual assistants and specialized AI chatbots for oncology will soon become important in clinical settings.
While AI models are making significant advances in cancer research, human judgement remains a crucial part in areas such as patient-centric decision making, validation of predicted drug targets, better interpretation of imaging data, ethical challenges, and using tools like ChatGPT. Therefore, at the end, a blend of AI and human experts may lead to improved diagnosis, prognosis, and treatment outcomes in clinical settings.
Finally, we must note a few major limitations of AI applications. First, choosing the appropriate algorithm can be intricate and depends on various factors like the type and complexity of the data. Second, to integrate AI in clinical settings, detailed application, and explanation and transparency of the algorithms must be achieved. Third, monitoring the quality of the AI tools for robust performance will be important. A detailed discussion will be necessary to establish which AI models and algorithms are acceptable and can provide valuable outcomes for cancer patients. Overall, AI has already significantly impacted cancer research, and addressing the challenges and validating the AI-generated results can lead the future of oncology research.
The Author(s). This article is published with open access at link.springer.com and journal.hep.com.cn