1 INTRODUCTION
Coronavirus disease 2019 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and can easily spread from one person to another person [
1]. On January 30, 2020, the World Health Organization (WHO) reported a total of, 7,818 confirmed cases worldwide. The WHO announced the COVID-19 pandemic on March 11, 2020. Owing to the extremely contagious nature of the virus, the sharp increase in infection and mortality led to the rapid slowdown of the world’s socio-economic framework. As of April 27, 2021, COVID-19 had spread to 223 countries/regions—there have been 147, 539, 302 confirmed cases of COVID-19, including, 3, 116
, 444 deaths [
2]. These circumstances make it essential to diagnose COVID-19 and isolate the infected people, otherwise the death toll will continue to increase. Although RT-PCR is widely accepted as a standard diagnostic method, it has some shortcomings. It requires adequate expertise to collect viral RNA that is extracted from a nasopharyngeal swab of patients. Also, testing using RT-PCR is time-consuming because it is necessary to maintain stringent laboratory conditions during the testing process. Therefore, researchers all over the world are seeking a fast, safe and automated method to diagnose COVID-19. In this regard, researchers have considered two medical imaging techniques, X-ray and CT-scans, to build a fast, accurate, and automated system for COVID-19 detection since both imaging techniques show their supremacy in cases of lung inflammation-related diseases such as COVID-19 [
3,
4]. Ai
et al. [
3] manifested that CT scans have a higher sensitivity rate than RT-PCR tests for COVID-19 diagnosis. This research was conducted on 1, 014 patients in Wuhan, China, who experienced both chest CT and RT-PCR tests. The authors concluded that chest CT tests might be a primary screening tool for detecting COVID-19. Moreover, it is reported that patients showed abnormalities in chest X-ray images, which is customary for people infected with COVID-19 [
5,
6]. A semantic representation of the classification of lung diseases is shown in Fig. 1, where we see that an abnormal lung can manifests the COVID-19 infection.
Since last year, there have been many studies investigating machine learning (ML) based architectures in the diagnosis, treatment and follow-up of COVID-19 [
7,
8]. However, due to the high risk of infection with the virus, medical professionals are particularly at risk. To control any possible contact with the virus, medical imaging could be the top priority for COVID-19 detection systems. Although an accurate system would be ideal, diagnosing COVID-19 from medical imaging remains challenging. Moreover, a higher detection accuracy and decisive findings are the top requirements for COVID-19 detection. Since the deep learning (DL)-based architecture for ML in particular is characterized by its impressive recognition performance in medical image classification, DL-based architectures have become attractive candidates for the detection of COVID-19 from chest CT scans and X-rays. Thus, the architecture based on DL is becoming the key to improving global health risk prevention through reducing epidemiologic risks. The purpose of this survey is to reveal DL-based architectures proposed by researchers to set up an automated diagnostic system for COVID-19 using CT scans and X-rays. The motivation for focusing on the DL-based architecture is that it can introduce key findings related to medical imaging, thereby delivering higher accuracy and key results, which is the goal in detecting COVID-19 from CT scans and X-rays.
In this regard, we have explored several survey papers, including one of the early surveys conducted by Albahri
et al. [
9]. In this survey, the authors looked at automated artificial intelligence (AI) applications based on data mining (DM) and ML algorithms to detect and diagnose middle east respiratory syndrome (MERS)-CoV and severe acute respiratory syndrome (SARS)-CoV. What’s more, the authors discussed the main features of the coronavirus, the benefits of using machine learning techniques in healthcare, and the limitations of using DM and ML algorithms. We differ from this survey in many ways, such as the use of incompatible datasets (focusing on MERS-CoV and SAR-CoV instead of COVID-19 CT scans and X-rays), and the use of traditional ML algorithms for classification instead of DL architecture. Furthermore, in another survey, Albahri
et al. [
10] presented a systematic overview of techniques for detecting and classifying COVID-19 from chest CT scans and X-rays for assessment and benchmarking. The authors conducted a survey of 11 AI-driven (
i.e., traditional ML and DL) research studies to detect and classify COVID-19 using various case studies. This survey paid limited attention in relation to dataset sources and distribution, dataset preprocessing techniques, direct comparison of various proposed models, and explainability features of the model by activation maps. However, our survey can also be a more effective alternative as it addresses the shortcomings of the previous survey [
10]. Next, Shi
et al. [
11] addressed AI empowered image acquisition, segmentation, and diagnosis of COVID-19, dealing with X-rays and CT scans. The difference between this survey and ours is that we not only consider the detection performance of the model but also consider the explainability of the model through visualization. Considering the explainability feature of the model has always been a key issue in the application of ML in healthcare. In another survey, Shoeibi
et al. [
12] discussed a review of the application research of DL technology in COVID-19 diagnosis and automatic lung segmentation, with a focus on work using X-rays and CT scans. This article also introduces the use of DL architecture to predict the prevalence of coronaviruses around the world. In addition, Dong
et al. [
13] reviewed the imaging characteristics and the use of computational models that have been used for COVID-19 management. For the detection, treatment, and follow-up of COVID-19, several imaging techniques have been explored, including magnetic resonance imaging (MRI), lung ultrasound, CT, and positron emission tomography-CT (PET/CT). The quantitative analysis of imaging data using AI is also discussed. In [
12,
13], although some attempts have been made to incorporate image acquisition, image preprocessing, feature extraction and classification paradigms into their survey, a comparison between the performance of the ML-based model and the performance of the radiologist is still lacking.
Our approach is different in many ways to previous survey papers. In our survey on ML-based COVID-19 detection, we mainly focus on data sources and pre-processing strategies, feature extraction, classification, visual explanation techniques, and detection performance of ML-based models and radiologists. First, we have explicitly reviewed the dataset sources for CT and X-ray image bench-marking, and then studied the dataset pre-processing techniques. Then, the different feature extraction techniques that were retained in detecting COVID-19 are discussed. Next, we have reviewed the classification methods used to detect COVID-19 and continue our exploration by analyzing the visual interpretation techniques implemented in previous methods. In fact, visually interpretable models make disease detection easier to understand. At the end, we also collected information about the radiologist’s findings and recommendations for ML-based methods for detecting COVID-19. None of the review papers [
9–
13] available tried to examine the relative recognition performance between ML-based methods and human-centered radiologists. It is necessary to focus on validating the results of the ML-based method with the results of radiologists, as most of the benchmarks were generated from several heterogeneous sources. To address this previously mentioned deficiency, we report (in “Detection performance between ML-based methods and radiologists”) on the detection performance between ML-based methods and radiologists. From our survey, we admit that the detection performance of ML-based methods is better than that of radiologists. This is the most relevant finding, and perhaps the most important finding in the development of AI-assisted COVID-19 detection from medical imaging. Moreover, this survey contributes to heath information technology literature, by revealing new insights from using analytical methods. In the survey study, proposed understanding would be offering supportive guidance to healthcare researchers and academics in this sphere in designing and trialing new clinical information systems for detecting and diagnosing any diseases through analyzing relevant CT or X-ray images. It is anticipated that the overall experiential approach that we introduce in this paper will enable the generation of precise outcomes in detecting diseases and help ensure safe and high-quality clinical operations for all nations.
2 COVID-19 IMAGING DATASET
In this section, we briefly discuss the sources of datasets commonly used in existing works.
2.1 COVID-19 CT scan dataset
CT scans can produce a precise image of the patient’s chest, making them an effective way to observe the condition of the lung. Table 1 lists the publicly available CT scan data sources that the researchers used in their research to find promising COVID-19 detection models. In addition, some of the CT scan data sources used in previous studies [
21,
26–
28] have not been made public.
2.2 COVID-19 chest X-ray dataset
For an automatic disease diagnosis system, medical practitioners have successfully used chest X-rays to detect the innate symptoms of COVID-19 infection, which is the opaque pattern of the lungs [
29]. Details of publicly available chest X-ray data sources for COVID-19 detection are shown in Table 2. Also, several articles [
21,
44,
76–
78] used X-ray images data sources that are not publicly available.
3 DATASET PREPROCESSING METHODS
Data preprocessing is seen as a strategy for converting raw data into prepared data. At the same time, data preprocessing is considered an essential diagnostic tool for data because it can fill in missing values, eliminate noisy data, indicate outliers, and find inconsistencies. Also, the data preprocessing step can make the dataset more versatile, thereby generating a robust DL-based model, thus enhancing the performance of the model. In this section, we will discuss several data preprocessing methods, which are considered the first step in building a COVID-19 detection model. Figure 2 shows different types of preprocessing techniques, such as resizing, brightness adjustment, generative adversarial network (GAN), and vertical flipping.
Table 3 summarizes the different preprocessing methods used in previous works. Resizing is one of the common data preprocessing methods used in 33 previous works, while flipping was used in 30 previous works. Also, scaling or cropping, contrast adjustment, brightness or intensity adjustment, and GAN were used in a total of 21, 14, 8 and 4 previous works, respectively. Moreover, we have observed that several preprocessing methods were applied in a single model. For example, some authors used three methods (namely Resizing, Flipping or Rotating, and Scaling or Cropping), while some authors used four methods flipping or rotating, scaling or cropping, contrast adjustment, and brightness or intensity adjustment. In contrast, some authors have only studied one preprocessing method. For example, in [
24], the author used a scale operation. In [
73], resize and rotation operations are considered. Besides, some researchers used adaptive winner filters to reduce noise [
99] and affine transformation [
40]. The ratio of each pre-processing technique used in the previous study is shown in Fig. 3, where resizing reaches the highest percentage, while GAN gets the smallest percentage.
4 FEATURE EXTRACTION METHODS
Feature extraction is the core step of any DL-based image classification model. Moreover, effective feature extraction is part of the dimensionality reduction process because it can reduce the initial data without missing significant information. Eventually, the reduced data can be used to develop the model with less machine effort and enhance the speed of learning and generalization steps in the DL-based models. DL-based classification models comprise two basic steps, feature extraction and classification, in which convolution and pooling operations are combined in the feature extraction process, and fully connected layers become part of the classification. Figure 4 shows a graphical representation of the DL-based image classification model.
4.1 Feature extraction methods for CT scan images
To identify patterns from CT scan images between COVID-19 patients and non-COVID-19 patients, the researchers used several related models (
i.e., ResNet [
102], VGG-16 [
103], DenseNet [
104]). In Table 4, some of the most used pre-training models are outlined, and among them, ResNet is used extensively, followed by VGG, DenseNet, etc.
In [
28], the authors used ResNet as a pre-trained model to perform binary classification on CT scan images with a depth of 71 layers and an input image size of 224
× 224
× 3. In [
20], the authors used DenseNet to improve the computational efficiency by reducing the image size, thereby obtaining an accuracy of 90
.61%. As a feature extraction method, the authors [
24] adopted an improved version of Inception v3 [
108] named IV3*, and then used the layers of the capsule network to train the extracted features. Contrasted with other pre-trained models, the network achieved the highest sensitivity and lowest specificity. Also, eight different deep learning models used in [
26] found that the performance of NasNet [
109] and MobileNet [
110] was better than the other six models.
Also, a previous study [
23] used t-SNE for feature visualization and the least absolute shrinkage and selection algorithm (LASSO) to identify the 12 most obvious features that distinguish COVID-19 from other pneumonia. Correspondingly, three additional features were extracted for the region of interest: distance feature, 2D boundary fractal dimension, and 3D gray-scale grid fractal dimension. As well, in [
21], inception recurrent residual convolutional neural network (IRRCNN) was used for feature extraction.
4.2 Feature extraction methods for chest X-rays
In this section, we have reviewed some state-of-the-art CNN architectures that are used for feature extraction from X-ray images. From Table 5, the most commonly used CNN architecture is ResNet, followed by DenseNet, VGG, Inception, Xception,
etc. In [
32], the author applied 14 CNN architectures to extract features and showed that ResNet has two benefits, namely, the effectiveness of obtaining the highest accuracy and the efficiency of reducing the time of feature extraction. In another work [
33], three CNN architectures Xception, ResNet and VGG-16 were explored, but obtained an accuracy of 97% through VGG-16. Also, in [
26], the authors examined eight different CNN architectures using CT scans and X-rays, and NasNet and MobileNet performed better.
Next, we will discuss some custom CNNs for detecting COVID-19. Some custom CNNs are compatible with visual interpretive methods and perform classification on multi-class or binary class. An overview of each custom CNN is shown in Table 6.
PDCOVIDNet detected COVID-19 from chest X-ray images. The proposed PDCOVIDNet can change the expansion rate in the parallel stack of the convolutional layer on the CNN, thereby reflecting more distinguishable features, and significantly improving the detection accuracy. In this article, the authors used 2,905 chest X-ray images for diagnosis, of which, 2,324 images were used for the training dataset.
CovMUNET is a CNN framework based on multiple losses, which can detect COVID-19 cases from X-ray images. It contains two branches, the reconstruction branch and classification branch, which calculate two different losses. In this study, the authors used 5 fold cross-validation and the dataset size was 6,594. Also, the authors who applied CovMUNET for two-class classification (COVID-19 vs non-COVID-19) have achieved significant accuracy, i.e., 99.41%, which is better than three-class classification (COVID-19 vs normal vs pneumonia).
COVID Smart Data based Network (COVID-SDNet) is a CNN based classifier that incorporates segmentation, data-augmentation, and data transformations along with a CNN for inference. The authors introduced us to a high-quality clinical dataset called COVIDGR1.0, which contains 754 images, of which 377 are marked as COVID-19. For the transfer learning method, the researchers adopted ResNet-50 and initialized it with ImageNet weights.
COVID-Net is another CNN architecture that uses a projection-expansion-projection design pattern to detect COVID-19 using chest X-ray images. The ImageNet dataset is used to obtain the pre-training weights of the proposed COVID-Net, and the COVIDx dataset is applied to the dataset, achieving an accuracy of about 93.3% on the COVIDx dataset.
CovXNet is a multi-dilation convolutional neural network that uses chest X-ray images to automatically detect COVID-19 and other pneumonia, while depthwise convolution is performed on chest X-ray images to extract significant features. In this work, the authors used a total of 6,161 images among which the first dataset consists of 5,856 images (1,583 normal 1,493 non-COVID viral pneumonia, and 2,780 bacterial pneumonia) and the second dataset comprises 305 images, each of the four classes, namely COVID-19, normal, non-COVID viral pneumonia, and bacterial pneumonia.
CoroNet is another CNN-based model that uses the Xception architecture. In this paper, the authors collected two different publicly available datasets, and then created their datasets to automatically detect COVID-19. Despite using a small dataset, namely 284 COVID-19 cases and 30 normal cases, CoroNet still achieved promising results.
COVID-CAPS is a framework based on a capsule network that can identify COVID-19 cases from X-ray images, which consist of four convolutional layers and three capsule layers. It is worth noting that COVID-CAPS performs better when dealing with smaller datasets. The number of trainable parameters without pre-training and with pre-training using COVID-CAPS is 295, 488.
Detail-Oriented Capsule Networks (DECAPS) combine Capsule Networks (CapsNets) that are based on ResNet to identify discriminative image features to detect COVID-19 patients using CT images. A total of 391 images were marked as COVID-19, while 339 images were marked as normal. In this paper, the authors used the DECAPS model with the smallest dataset to obtain an accuracy of 87.60%. Due to the scarcity of sample images, the author applied a conditional adversarial network and other preprocessing techniques, such as rescaling (286 × 286), and cropping (256 × 256). The authors of this article also demonstrated the combination of DECAPS and Peekaboo to improve accuracy.
DarkCovidNet is an architecture that implements 17 convolutional layers and introduces different filtering on each layer. It obtained an accuracy of 98.08% and 87.02% for binary and multi-classes. In addition, the proposed DarkCovidNet provides heat maps that can help radiologists find the affected area on chest X-rays.
CapsNet is built on the capsule network to detect COVID-19 disease by using chest X-ray images. The proposed method aims to provide a fast and accurate diagnosis for COVID-19, and achieves 97.24% and 84.22% accuracy through binary and multi-class classification, respectively.
5 CLASSIFICATION METHODS
In DL-based architecture, the classification process takes place at the top of a fully connected layer called the softmax layer, and the convolutional layer is performed as a feature extractor in the CNN architecture. However, some researchers have used pre-trained CNN and support vector machine (SVM) classifiers to achieve improvements [
48,
101]. Also, in [
111], the authors incorporated CNN with k-nearest neighbors (k-NN) and a support estimator network, but it requires a lot of data to train. In the study [
70], the authors applied the COV-ELM classifier, which applies an extreme learning machine (ELM) to classify COVID-19 from chest X-rays and adjust the network with minimum interference, thereby reducing training time. An end-to-end web-based detection system with a bagging trees classifier was promoted to simulate the digital clinical pipeline and facilitate the screening of suspicious cases in [
49]. In [
113], the authors introduced adaptive feature selection guided deep forest (AFS-DF) as a classification method, which has a higher accuracy than logistic regression (LR), random forest (RF), neural network (NN) and SVM classifiers. As can be seen from Table 7, binary classification is most used by researchers, rather than multi-class classification. However, the binary classification may be ambiguous when detecting COVID-19 because it cannot distinguish between other viral pneumonia and COVID-19.
6 EXPERIMENTAL RESULTS
The evaluation metrics most used in previous studies to assess the performance of DL-based COVID-19 detection systems are accuracy, precision, sensitivity, specificity, and F1-score. The definitions of accuracy, precision, sensitivity, specificity, and F1-score are as follows:
where TP, TN, FP, and FN stands for true positive, true negative, false positive, and false negative, respectively. TP refers to the correct classification of the positive class, and the correct classification of the negative class is denoted as TN. FP is a false prediction of positive values, which means that the model classifies an image as COVID-19, but the image does not contain any COVID-19 symptoms. On the other hand, FN is a false prediction of negative values, for example, the actual COVID-19 image is classified as non-COVID-19. Among the several methods proposed, some are highly powerful for detecting COVID-19 and perform best in terms of accuracy. In addition, some proposed methods considered the area under the curve (AUC) to illustrate how precisely the model can predict the results.
6.1 COVID-19 classification results using CT scans
Table 8 summarizes the experimental results along with the feature extraction methods using CT images. We can see that some of the previous methods provided training, testing, and validation ratios, but some methods did not provide a validation ratio. In terms of accuracy, the methods using IRRCNN [
21] and ResNet [
28] obtained the highest scores, 99
.56% and 99
.4%, respectively. On the other hand, while considering sensitivity as a performance metric, the cGAN [
14] feature extraction method obtained the highest value of 99
.97%. The CT scan results used to distinguish COVID-19 have attracted great attention from researchers. Prior research conducted on CT scans of COVID-19 patients could be characterized using the following features: (1) detection performance only; (2) detection performance and region-based learning to label infection and localization of abnormality; (3) detection performance along with interpretation through visual marker; (4) verification of results by expert radiologists besides statistical analysis. Several studies [
16,
20,
22,
25,
27,
99,
113] only provide detection performance through various evaluation metrics such as accuracy, precision, recall, F1-score, and AUC. On the other hand, some studies [
15,
21,
24,
28,
83,
92,
107,
114] considered the segmentation of the infected lung in addition to the detection performance. In their methods, they tried to localize abnormality in the lungs through region-based learning. However, to make the model more robust, some studies [
17,
18,
26,
82,
105] looked at visual interpretation through visual markers. Due to the lack of sufficient data in the benchmark, certain methods [
19,
23,
85,
95,
100,
106,
115] verified the statistical results with radiologists. These methods will collect data on COVID-19 patients from several hospitals. Table 9 shows the percentages of different model evaluation criteria, and previous research has focused on this to validate the model. After reviewing CT scans to examine COVID-19 cases related to DL-based methods, we found that 32% of previous work focused on abnormal lung positioning, while 21% of work focused on radiologist testing and/or verification. On the other hand, 18% of previous studies covered visual interpretation as well as detection, and 29% of studies performed detection only.
6.2 COVID-19 classification results using chest X-rays
Table 10 shows an overview of experimental results as well as dataset settings and applied feature extraction methods. Most previous methods divided the dataset into various ratios of training, testing, and validation, while other methods used cross-validation. While considering cross-validation, the preceding studies analyzed 10-fold [
44,
49,
67,
70], 5-fold [
31,
32,
50,
77,
79], and 4-fold [
45] cross-validation. Compared to 4-fold cross-validation, 10-fold and 5-fold cross-validation provide better results. Also, accuracy was the most appropriate evaluation index considered by several preceding methods. In contrast, few methods considered sensitivity, F1-score, and AUC to demonstrate the potency of these models. One of the previous studies [
88] used Inception, and its AUC reached 100%. The sensitivity of feature extraction using NASNet-Large [
38] is 100%, while the F1-score using the COV-ELM [
70] method is 95%. In addition, in terms of accuracy, NASNetMobile [
26] scored the highest, reaching 100%. Based on the categorization of the experimental evaluation features we discussed in the previous section, we have observed that there are only a few studies [
57,
74,
88,
93,
111] centered on the results of radiologist’s verification. Several studies [
21,
31,
39,
52,
73,
76,
79] claim that their method can detect abnormalities in chest CT scans while improving the accuracy of detection. Screening with visual interpretation is an important aspect of diagnosing COVID-19. Hence, we have seen some studies [
26,
30,
33,
36,
37,
40,
50,
51,
53,
71,
77,
84,
97] that consider visual markers and test results. Table 11 shows the percentages of different model evaluation criteria performed by previous researchers for model acceptance. After studying X-ray images to explore COVID-19 facts related to DL-based methods, we observed that 10% of early efforts aimed at the positioning of the infected lung, while 7% of efforts centered on a radiologist’s investigation. However, 19% of past studies combined visual analysis and detection, while 64% of studies only performed detection.
6.3 Detection performance between ML-based methods and radiologists
Some studies have developed COVID-19 detection models based on ML and verified the results with radiologists (
i.e., [
19,
23,
95,
100,
106,
115] uses CT images, [
57,
74,
88,
93,
111] uses X-rays). In the case of CT images, two studies [
106,
115] involved radiologists reviewing benchmarks and results, but they did not compare the results between ML-based methods and radiologists. However, other works [
19,
23,
95,
100] used radiologists to review benchmarks and compare the detection performance between ML-based models and radiologists. Considering two studies [
19,
23], the ML-based model gives an AUC score of 0
.95, and when the radiologists are considered, its AUC is 0
.85. On the other hand, the other two works [
95,
100] provided extensive evaluations by comparing accuracy, sensitivity, and specificity.
In [
100], the accuracy, sensitivity, and specificity of the ML-based method are 89
.5, 0
.88, and 0
.87, respectively, and the accuracy, sensitivity, and specificity provided by radiologists are 55
.6, 0
.72, and 0
.51, respectively. In addition, the ML-based method used in [
95] has an accuracy of 96%, which is 11% higher than the results obtained by radiologists. This study [
95] also performed better than radiologists in terms of sensitivity and specificity. As we have seen from these studies [
19,
23,
95,
100], ML-based methods provide better detection performance than independent radiologists. These results demonstrate the potential superiority of ML-based systems over the detection performance of radiologists. On the other hand, in the case of X-ray images, some works [
57,
74,
88,
93,
111] created datasets in close collaboration between AI experts and radiologists. These works did not provide any direct comparison of the detection performance between ML-based methods and radiologists, but radiologists have made some observations that focus on the suitability of ML-based methods and on the performance of the developed system in detecting COVID-19.
7 VISUAL EXPLANATION
Although the CNN-based architecture can provide excellent results, it is not highly suited to medical diagnostic systems because of the emphasis in such systems on detection performance and visual means to interpret the results. Some early studies focused on envisioning the behavior of CNN models, and implementing interpretable models through visualization methods that emphasize the importance of identifying classes. Visualization is an approach for generating class activation maps through heat maps, which can be interpreted as revealing the neural network to make decisions that highlight significant areas in the image. Many researchers have adopted several visualization methods to interpret the prediction results and identify the key areas of chest X-rays by generating class-discriminating saliency maps. Some previously used visualization approaches are the class activation map (CAM) [
116], gradient-based class activation map (Grad-CAM, Grad-CAM++) [
117], layerwise relevance propagation (LRP) [
118], and local interpretable model-agnostic explanations (LIME) [
119], etc. Figure 5 shows the output of the heat maps corresponding to the periodic observation of chest X-rays, and how the heat maps intuitively illustrate what the model is concerned about. As can be seen from Tables 12 and 13, for CT scans and X-rays, Grad-CAM is the most used visualization technique, not CAM. However, in some studies [
26,
37,
65], the purpose of exploiting LIME is to obtain visualization, thereby rectifying miss-classification. Figure 6 summarizes the visual interpretation studies that used machine learning techniques in chest X-rays and CT scans of COVID-19 cases.
8 DISCUSSION
In this survey, we studied several diagnostic models to detect COVID-19 and spelled out the characteristics of these diagnostic models. As mentioned above, Tables 1 and 2 show the publicly available chest CT and X-ray datasets, respectively, for COVID-19 detection. The largest COVID-19 data size of CT images is 28, 395, while for X-ray images it is 589. Since images come from multiple sources and different CNN structures have their own specifications, particularly with respect to image size, previous studies have explored data preprocessing methods, as shown in Table 3 and Fig. 3. The most common preprocessing method used is resizing, and the percentage obtained from GAN is the smallest. Next, for feature extraction (in Tables 4 and 5), most of the previous research has focused on the use of deep features, and the most widely used CNN architecture for feature extraction is ResNet. In addition to the commonly used CNN, some previous studies have also developed custom CNN architectures where CovMUNET [] shows better performance in terms of accuracy, reaching 99.41%.
As can be seen from the experimental results above (in Tables 8 and 10), it is worth noting that diagnostic models using various methods (
i.e., VGG, ResNet, InceptionNetV3, MobileNet v2, Xception) on CT scans and X-rays give encouraging results for the assessment of COVID-19 patients. It is evident from this survey that the best accuracy score of detecting COVID-19 based on X-rays is 100% [
26], while the best accuracy found for CT images is 99
.56% [
21]. We also observed that the pre-trained CNN model based on transfer learning gained an accuracy of 98
.27% for CT images [
16]. In [
39], the author proposed a deep learning CAD system to use chest X-ray images to detect COVID-19 and eight other lung diseases, such as atelectasis, infiltration, pneumothorax, mass, effusion, pneumonia, cardiac hypertrophy, and nodules. A weekly supervised deep learning strategy for detecting and classifying COVID-19 infection from CT images is proposed in [
83]. The advantage of this model is that it can reduce the need for manually marked CT images. Although some models reported higher detection accuracy, the datasets used during training were not large enough (
i.e., [
4]). Finally, some studies have incorporated visualization techniques (
i.e., CAM, Grad-CAM, Grad-CAM++, LIME, and LRP) to highlight key regions that are closely related to the predicted outcomes, and the most common visualization technique used is Grad-CAM for CT scan and X-ray models.
Although the application of ML in medical imaging has shown impressive performance in detecting COVID-19, it also has some limitations. For example, due to the lack of COVID-19 data availability, researchers are processing a limited number of COVID-19 medical images. Therefore, many public datasets suffer from a class imbalance problem, which causes overfitting, as they have a limited number of COVID-19 cases. This problem impedes the use of ML to achieve excellent performance because ML techniques do not perform well on skewed datasets. Another limitation of previous studies is that they did not include clinical symptom information (such as fever, cough, fatigue) and demographic information (such as age, gender, location). Due to the possibility of a mutation in COVID-19, the behaviour of coronavirus is relatively unknown, so using these two types of information can boost the performance of ML-based models. The third limitation is that most previous studies have paid little attention to the validation of the ground truth. Therefore, the time between RT-PCR tracking and image information needs to be recorded to establish a benchmark to facilitate effective testing, which helps to make the model more robust. Another limitation relates to visualization techniques (i.e., CAM, Grad-CAM) in which activations are highly dispersed and emphasized over irrelevant areas, particularly the mediastinum and shoulder areas in the X-ray image.
To speed up the application of machine learning techniques on a huge number of medical images, a variety of benchmarks of CT scan and X-ray images of COVID-19 have been promulgated throughout the world. However, one of the challenging tasks is to pre-process the images to make them consistent and aid further analysis, because these images come from various organizations that use heterogeneous scanners. Another challenge is building a workforce that needs to be built by machine learning experts to develop effective algorithms and by radiologists to thoroughly examine the entire work process. Although several studies [
19,
23,
57,
74,
88,
93,
95,
100,
106,
111,
115] have developed sophisticated ML-based COVID-19 detection systems and verified the results with radiologists, there is still a lack of the deployment of proposed methods for clinical translation for the diagnosis of COVID-19. To maximize the possibility of integrating models into clinical trials to set up cost-effective clinical and technical validation, top-notch benchmarks, external validation and papers with adequate documentation that can be replicated are needed. As future research, it is suggested that future improvements may combine laboratory test results and image data with clinical findings to better detect and diagnose COVID-19. We hope that the blend of laboratory test results and clinical findings will contribute to the rapid diagnosis and prognosis of COVID-19. In addition, it may include the explainability features of the model for fairness and accountability of the model decision. This can encourage the application of AI-assisted diagnosis of COVID-19 in clinical use.
9 CONCLUSION
COVID-19 is a highly infectious disease that can quickly affect the lungs. If we cannot diagnose it quickly and accurately, it may lead to irreparable disasters, including death. The mortality rate can be reduced by identifying infected patients early and providing them with appropriate treatments. This may happen when using a DL-based automatic diagnosis system because it can be diagnosed precisely in a short time. Also, DL-based diagnostic models that use chest CT scan and X-ray images have a higher potential to reinforce radiologists in rapid COVID-19 detection. In this survey, we outline the research involving DL-based diagnostic models to detect COVID-19 from CT scans and X-rays, focusing on feature extraction methods, classification, detection performance, and interpretability. We believe that our survey can provide important insights for medical imaging and help researchers around the world in the fight against the COVID-19 pandemic. Also, beyond this, developed knowledge presented in this paper could apply to other similar disease diagnosis and detection domains for enhancing quality clinical outcomes.
10 METHOD
After selecting the required papers in this survey, we focus on several important steps followed in different papers for diagnosing COVID-19. First, we describe the source and preprocessing techniques of the dataset, as well as their characteristics and attributes. Then, a comprehensive review of feature extraction methods and classification techniques are carried out. In the end, the results obtained in the existing studied papers along with visualization techniques are discussed.
In this survey, we looked at many research papers from authenticated databases such as the Web of Science, Scopus, Google scholar, medRxiv, arXiv, and engrXiv. Table 14 shows the query string/keywords used for searching papers. We selected paper that used CT scan and/or chest X-ray images, and excluded papers that used lung ultrasound (LU) and/or magnetic resonance imaging. We also excluded those papers that did not detect COVID-19 and instead focused on other issues such as the impact of COVID-19 on the cardiovascular system, epidemic prediction. The papers that we have included mainly used machine learning techniques to detect COVID-19. After excluding irrelevant papers, we have found 130 papers in total (26 from Web of Science, 29 from Scopus, 75 from Google scholar). After excluding 32 duplicates, we finally included 98 papers in this survey.
The Author(s) 2022. Published by Higher Education Press.