Introduction
Mesial temporal lobe epilepsy (mTLE) is the most common type of focal epilepsy in adults, and its pathophysiological substrate is usually hippocampal sclerosis (HS) [
1]. Magnetic resonance imaging (MRI), notably resting state-functional MRI (rs-fMRI) [
2,
3] and structural MRI (sMRI) [
4,
5], has a pivotal role in the evaluation of patients with mTLE. However, most of the previous MRI studies measured average group-level differences, rather than evaluating individual patients.
In biological neurology, interest on machine learning (ML) techniques for neuroimaging data for the diagnosis of epilepsy has been growing [
6–
8]. mTLE has been the main focus of this work [
9,
10]. Most previous studies using ML techniques to investigate mTLE have used a single neuroimaging modality: multiparameter sMRI data can discriminate patients with mTLE from controls with 81% accuracy [
11], whereas combining six rs-fMRI measures achieved 83% accuracy [
12]. ML has also been applied to other neuroimaging modalities, such as diffusion tensor imaging (DTI) [
13,
14].
Although these studies have shown the potential of integrating appropriate MRI data with ML to detect mTLE with acceptable accuracy, several questions remain. First, brain alteration in the left mTLE is reportedly more extensive than that in the right mTLE [
15–
17], suggesting different patterns of brain abnormalities. However, whether detecting the right and left mTLE separately is better than pooling them when using ML remains unknown. Second, although structural and functional brain abnormalities have been reported in mTLE [
18,
19], no study has integrated structural and functional data with ML despite that this approach improves classification performance in other diseases [
20].
Thus, we used ML and combined sMRI and rs-fMRI data to distinguish mTLE patients and healthy controls (HCs). From sMRI, we extracted three measures as inputs for classification, which have been successfully used in mTLE investigation as follows: gray matter (GM) [
21], white matter (WM) [
22], and cortical thickness [
23]; from rs-fMRI, the inputs were amplitude of low frequency fluctuation (ALFF) [
24] and regional homogeneity (ReHo) [
25,
26]. The five measures were combined to provide integrated information on functional and structural brain alterations in mTLE.
We hypothesized that (1) detecting patients with right mTLE and left mTLE separately rather than pooling them can improve classification accuracy; (2) the combination of structural and functional measures within a multimodal and multi-measure model would result in a better classification performance; and (3) temporal lobe would mostly contribute to the classification given that it is the main site of abnormality in mTLE.
Materials and methods
Participants
From September 2013 to January 2018, 74 mTLE patients were consecutively recruited in the Department of Neurology in West China Hospital of Sichuan University (Chengdu, China). All of the patients are right-handed and met the International League Against Epilepsy criteria for diagnosis of mTLE [
27,
28]. All patients had unilateral HS (37 left and 37 right) as assessed by hippocampal atrophy on the T1-weighted MRI (qualitative assessments by two radiologists) and increased signal on the T2 fluid-attenuated inverted recovery in the mesial temporal region. Video electroencephalogram (EEG) was used to confirm that seizure onset was in the ipsilateral temporal lobe. No other mass brain lesion, traumatic brain injury, or any psychiatric disorder was apparent in the MRI, EEG, and neuropsychological examination results. In addition, 74 age-matched and sex-matched right-handed HC were enrolled, all of whom were free of any neurological or psychiatric disorders at the time of the study. Table 1 shows the demographic and clinical characteristics of the study groups.
This study was approved by the West China Hospital Clinical Trials and Biomedical Ethics Committee of Sichuan University, and written informed consent was obtained from each participant. The study protocol was performed in accordance with the approved guidelines.
MRI data acquisition
MRI scanning was performed with a 3 T system (Tim Trio; Siemens Healthineers, Erlangen, Germany) with an eight-channel phased array head coil. Each functional examination contained 200 image volumes, and total imaging time was 410 s. The participants were instructed to not focus their thoughts on anything in particular and keep their eyes closed during the acquisition. Head motion was minimized using foam pads. The functional scanning parameters were as follows: repetition/echo time, 2000/30 ms; flip angle, 90°; 30 axial sections per volume; 5 mm section thickness (no gap); 64×64 matrix; field of view, 240 mm× 240 mm; and voxel size, 3.75 mm×3.75 mm×5 mm. High-resolution 3D T1-weighted images were obtained through structural scanning with spoiled gradient-recalled sequence. The structural scanning parameters were as follows: 176 slices; slice thickness, 1 mm; flip angle, 9°; matrix size 256×256; repetition/inversion/echo time, 1900/900/2.26 ms; and voxel size, 1 mm×1 mm× 1 mm.
MRI data analysis
Fig. 1 shows an overview of the classification approach. Five individual measures were analyzed using a support vector machine (SVM) learning model, namely, cortical thickness, GM, and WM, which were extracted from sMRI data, and ALFF and ReHo, which were extracted from rs-fMRI data.
Functional data preprocessing
Rs-fMRI data were preprocessed using the Data Processing Assistant for Resting-State fMRI (DPARSF) 4.3 Advanced Edition. The first 10 volumes of rest data were removed so that the effect of instability on the initial MRI signal could be minimized; correction was performed for acquisition delay between slices; WM nuisance signals and cerebral spinal fluid blood oxygen level dependent signals were regressed; Friston 24 head-motion profiles and scrubbing regressors were used to minimize the effects of head motion. Normalization was performed using echo planar imaging (EPI) templates (voxel size 3 mm× 3 mm×3 mm), and smoothing was performed with a Gaussian kernel of 4 mm full-width at half-maximum. Finally, functional data were filtered (band pass: 0.01– 0.08 Hz) for the reduction of the effects of low-frequency drift and high-frequency noise smoothing.
DPARSF software was used to extract ReHo maps from preprocessed images. The linear trends in the unsmoothed images were removed, and a band-pass filter (0.01– 0.08 Hz) was applied to reduce low-frequency drift and high-frequency respiratory and cardiac noise. Then, ReHo maps were generated by calculating the concordance of Kendall’s coefficient (values from 0 to 1) of the time series of a given voxel with those of its 26 nearest neighbors. The ReHo value of each voxel was standardized by dividing the value by the global (within-brain) mean ReHo value.
ALFF was calculated using DPARSF software. After a band-pass filter (0.01–0.08 Hz) was applied and linear trends were removed, the time series was transformed to the frequency domain by using fast Fourier transform. The square root of the power spectrum was calculated and averaged across 0.01–0.08 Hz for each voxel to yield ALFF. The ALFF of each voxel was standardized by dividing it by the global (within the brain) mean ALFF value.
Structural data preprocessing
The 3D T1-weighted images were preprocessed using the Diffeomorphic Anatomical Registration Through Exponentiated Lie (DARTEL) toolbox based on Statistical Parametric Mapping (SPM) software. The structural image was segmented into GM and WM, and anatomical registration was performed using DARTEL algebra in SPM8 for registration, normalization, and modulation. The registered images were transformed to Montreal Neurological Institute (MNI) space (voxel size 1.5 mm×1.5 mm×1.5 mm). The signal-to-noise ration was increased by smoothing the normalized and nonmodulated images (GM and WM density images) with a 10 mm full-width at half-maximum Gaussian kernel. The preprocessed GM and WM probability maps (the density of GM and WM were reflected by voxel density) were used as measures for ML analysis.
Cortical thickness was calculated using FreeSurfer software. The 3D T1-weighted images were processed with the recon-all processing pipeline for cortical reconstruction and volumetric segmentation [
29]; the streamlined pipeline included the removal of nonbrain tissue, Tailarach transformations, segmentation of subcortical white and deep GM regions, intensity normalization, and atlas registration. A mesh model of the cortical surface was generated, and the cortical surface was parcellated into 34 cortical regions on the basis of the gyral and sulcal landmarks for each hemisphere and of the Desikan–Killiany atlas [
30]; the cortical thickness for each of these 34 cortical regions was calculated per hemisphere. We blurred each participant’s morphometric parameter map by using a 25 mm full-width at half-maximum surface-based Gaussian kernel to improve the ability to detect population changes. Finally, we combined the cortical thickness maps of the left and right hemispheres into a whole brain map and used as a measure for the ML analysis.
Machine learning classification and evaluation of models
We used SVM [
31] to perform single-subject classification. SVM maps the input vectors to a feature space with a set of mathematical functions known as kernels. In this space, the model finds the optimum separation surface that maximizes the margin between different classes within a training data set. After the separation surface is determined, it can be used to predict the class of new observations by using an independent testing data set. Here, the risk of overfitting was minimized by using a linear kernel instead of a nonlinear one to minimize the risk of overfitting. The model was based on LIBSVM [
32] and implemented using the Scikit–Learn library [
33].
To investigate the performance of each SVM model, we used a 10-fold stratified cross-validation approach. The participants were initially divided into 10 non-overlapping partitions, and the same ratio between patients with mTLE and HCs as the whole group was maintained in each partition. In each iteration, one partition was considered as the independent test set (where the performance metric was calculated), and the remaining subjects were defined as the training set. Within each training set, we performed internal cross-validation (i.e., 10-fold stratified nested cross-validation) to select the optimal set of the hyperparameters of the ML models. The linear SVM has only one hyperparameter (the soft margin parameter C) that controls the trade-off between reducing training errors and having a large separation margin. This parameter was optimized by performing grid search on the following values: C= 10−3, 10−2, 10−1, 100, 101, 102, 103, and 104. The optimum C value for each input measure was determined. The set of parameters with the best performance in a series of internal cross-validation tests was selected for each imaging modality and used for training the SVM models.
In multiple measure analysis, we combined the SVM predictions of single measures through a weight-averaging method (soft voting), which is slightly more effective than either sum of kernels or multi-kernel learning [
34]. We trained each SVM model by using a single measure so that we could estimate the likelihood of an individual to belong to the patient or control group (using the Scikit–Learn library default method). Next, we calculated the weight probabilities of each specific measure by multiplying its predicted probabilities with an optimized coefficient. After the grid search for the C parameter, second nested cross-validation was performed to optimize the coefficient of each specific measure for soft voting. Each coefficient was evaluated using a grid search with a coefficient search space assuming an integer value between 1 and 10. This second nested cross-validation was also performed using a 10-fold stratified cross-validation. In both nested cross-validations, the highest mean balanced accuracy (defined as the mean of sensitivity and specificity) of the model was used to define the best hyperparameter value. Sensitivity and specificity were simultaneously considered by using mean balanced accuracy to optimize the model, which is better than simple accuracy when the samples of the two groups were unbalanced. Finally, we calculated the average of the predicted weight probabilities, which were the weighted averages of the probability that the SVM model based on each measure predicted that an individual subject belonged to the two groups. The group with the highest score was defined as the predicted class for a given subject.
After the training of SVM models, the performance of the SVM model in conjunction with the evaluation data was evaluated. To avoid the influence of imbalanced data sets in left or right mTLE analysis, we calculated the mean balanced accuracy for each SVM models. We also reported the sensitivity, specificity, recall, F1 score, and area under the receiver operating characteristic curve (AUC) to evaluate the performance comprehensively. To obtain meaningful confidence intervals and P-values for each cluster, we examined the statistical significance of the classification models with a random permutation test (1000 times).
Discriminative brain region maps
An anatomical automatic labeling (AAL) atlas consisting of 90 regions of interests (ROIs) was used to construct maps of discriminative brain regions [
35]. The weight maps are the spatial representation of the decision function that defines the level of each ROI’s contributions to classification. We reported the top 10 discriminative regions of four measures (ReHo, ALFF, GM, and WM), to seek the objective biomarkers of mTLE. The regions that were in the top 10 discriminative regions for more than two measures were defined as the most discriminative regions. Then, we extracted the submaps of these regions on the basis of each measure (ReHo, ALFF, GM, and WM) and used an SVM technique based on the integration of these submaps to verify the results.
Results
Classification performance
The balanced accuracies, sensitivities, specificities, recall, F1 score, AUC and P-values for the single-subject classification of patients and HC are reported in Table 2, and Fig. 2 shows an overview of classification accuracy. In the identification of all patients versus HC, we obtained an accuracy of 63% for ReHo, 63% for ALFF, 58% for GM, 72% for WM, and 63% for cortical thickness. Dividing the patients into left and right mTLE, for the discrimination of left mTLE from HC we obtained an accuracy of 75% for ReHo, 75% for ALFF, 73% for GM, 76% for WM, and 72% for cortical thickness; for the discrimination of right mTLE from HC we obtained an accuracy of 68% for ReHo, 73% for ALFF, 66% for GM, 73% for WM, and 66% for cortical thickness. Thus, dividing the patients into left and right mTLE allows more accurate classification than pooling all patients, for left mTLE (mean 74% versus 64%; paired t-test, P = 0.005) and right mTLE (mean 69% versus 64%; paired t-test, P = 0.030).
In the identification of left mTLE versus HC, combining the functional measures (ReHo and ALFF) yielded an accuracy of 78%. Combining the structural measures (GM, WM, and cortical thickness) yielded an accuracy of 79% (Table 2). Thus, for structural and functional modalities, combining different measures resulted in a marginally higher accuracy of classification than using single measures alone. For discriminating either all patients or right mTLE from HC we found no such increase in accuracy.
Combining all measures across structural and functional modalities resulted in an accuracy of 84% for discriminating left mTLE from HC (Table 2), and was higher than that of single modalities, such as structural (84% vs. 79%) or functional (84% vs. 78%) modalities. Accuracy did not increase when either all patients or right mTLE was discriminated from HC.
Most discriminative brain regions
To determine which brain regions contributed to single-subject classification, we computed the mean absolute values of the weights of the model across the different stages of the cross-validation and used a template mask based on the AAL atlas to extract the weight for each region. The top 10 brain regions with the highest mean values based on each measure are reported in Table 3.
The brain regions contributing to single-subject classification varied across the four measures of interest. However, some regions were detected in at least two of our four measures of interest (Figs. 3 and 4). In the classification of left mTLE and HC, the regions detected in at least two individual measures included some of the structures of the left temporal lobe (such as the left inferior temporal gyrus, left temporal pole: middle temporal gyrus) and of the default-mode network (DMN) [
36,
37] (such as the left superior parietal gyrus, the left inferior parietal gyrus, and left angular gyrus). In the classification of right mTLE and HC, the main role was played by the right temporal lobe (right inferior temporal gyrus, right temporal pole: middle temporal gyrus, right temporal pole: superior temporal gyrus), left pallidum, and bilateral putamen. Taken collectively, the discriminative regions for the left and right mTLE mainly focused on the ipsilateral temporal lobe, on the extra-temporal regions, such as DMN for left mTLE, and left pallidum, and bilateral putamen for the right mTLE. When SVM was used to discriminate between the left and right mTLE from HC on the basis of integration of the submaps of these regions for each measure, we obtained a balanced accuracy of 70.2% for the left mTLE and 64.3% for the right mTLE.
Discussion
This study combined functional and structural MRI measures to distinguish patients with mTLE from HC. Our results suggest that classification accuracy can be improved by dividing the mTLE patients into two groups (left and right) and combining functional and structural MRI measures. The temporal lobe contributed most to the single-subject classification, and some extra-temporal regions also had high discriminative power.
Consistent with our hypothesis (1), the grouping of the mTLE patients into left and right mTLE improved classifier performance. This finding reflects that the left and right mTLE are associated with distinct brain alterations and is corroborated by functional and structural imaging findings [
15–
17]. The accuracy of the left mTLE versus HC was higher than that of the right versus HC for different modalities in accordance with previous ML studies [
9,
14,
38]. Evidence suggests that the functional and structural alterations in left mTLE are more extensive than in the right mTLE [
39,
40]. In functional studies, the left mTLE showed considerable reduction in functional connectivity compared with the right mTLE [
41]. Furthermore, the left mTLE has been associated with alterations in bilateral mesial temporal lobes, whereas the right mTLE has been associated with alterations in the right mesial temporal lobe [
42]. Several structural studies found that left mTLE shows more extensive loss of WM and aberrant inter-tract correlations than the right mTLE [
43]; another study reported considerable alteration in GM and WM in the left mTLE rather than the right mTLE [
44]. The left hemisphere is dominant in most right-handed persons [
45], and all of our subjects were right-handed. Thus, diversity may be explained on the basis of seizures originating from the dominant hemisphere that causes excitotoxic damage in left-hemisphere-dominant patients.
Consistent with our hypothesis (2), in the identification of left mTLE and HC, by combining two functional measures (ReHo and ALFF) we achieved a 78% accuracy (comparable to 83% accuracy in distinguishing mTLE from HC reported in a study combining six rs-fMRI measures) [
12]. By combining three structural measures (GM, WM, and cortical thickness), we achieved an accuracy of 79% (comparable to 81% accuracy in distinguishing mTLE from HC reported in a study combining multiparameter sMRI data) [
11]. The highest accuracy of 84% was obtained after all the measures were combined. This finding is in accordance with the results reported in studies on other neuropsychiatric disorders [
46,
47]. The increased accuracy supports the view that mTLE can cause structural abnormality and functional alterations of the brain [
48–
51]. Therefore, combining multimodal measures within a single model could serve as a promising tool for improving the classification of individual patients with mTLE. By contrast, the accuracy in the classification of right mTLE and HC did not increase by combining functional and structural measures. This result might have been due to some of the neuroimaging modalities (ReHo, GM, cortical thickness) were more sensitive in detecting left mTLE than right mTLE (Table 2).
The best-discriminative regions were widespread and not restricted to particular brain hemispheres or lobes across the four measures. The two possible reasons that an individual region in SVM might display high discriminative power are a between-group feature value difference in that region or a between-group difference in the correlation between that region and other areas. Thus, the widespread network revealed in this study should not be interpreted in terms of individual regions but as a spatially distributed pattern of discrimination informed by all brain voxels. Direct comparison is difficult with reported sMRI or rs-fMRI studies via mass-univariate analyses, but brain regions showing considerable difference should contribute to the SVM-based classification. The discriminative regions we detected in more than two measures partially overlap with previous studies. For example, consistent with our hypothesis (3), the ipsilateral temporal lobe contributed most to classification across the four measures. Consistent with this finding, the epileptogenic zone is often involved in the mesial and lateral temporal lobe in mTLE [
52,
53]. In left mTLE and HC identification, some regions of DMN are also important. Previous studies have found functional or structural alterations of DMN in mTLE [
19,
54,
55]. DMN is an integrated system for self-related cognitive activity, including autobiographical, self-monitoring, and social functions [
56]; as such, impairment of DMN in mTLE may be an underlying pathophysiological mechanism of impaired cognition [
57]. Previous studies have suggested that alterations of DMN in mTLE may be related to the rich connections that exist between the hippocampus and several key structures of this network [
58]. In addition, several subcortical regions, such as pallidum and putamen, also had high discriminative power in the classification of right mTLE and HC. Consistent with such findings, alterations of pallidum and putamen in mTLE have been reported in previous studies [
59].
The present work has several limitations. First, a major challenge in the application of ML to high-dimensional neuroimaging data is the risk of overfitting. We minimized this risk by substituting voxel-level data with region-level features, which are associated with less noise and low risk of overfitting [
60]. Second, although the present results are promising, the development of a practical diagnostic requires several advances. The model requires improved accuracy by including diverse observations from multimodal imaging. Finally, we identified the discriminative regions based on the basis of AAL atlas to make the discussion of the neurobiology of mTLE easy. However, the potential drawback is that some atlas areas (e.g., the hippocampus region) might be too large or unspecific for detecting group differences.
In conclusion, the present study showed that grouping the mTLE patients into left and right mTLE and combining multimodal measures within a single model improved the classifier performance. Thus, subtyping of patients and integration of multimodal neuroimaging modalities could serve as promising methods for improving classifier performance in the classification of individual patients with mTLE and HC.