For the discovery of candidate biomarkers on high-dimensional gene expression data, existing published methods can be summarized into these categories: (1) Statistical analysis technologies. Generally, these methods filter differentially expressed genes on pairs of tumor and adjacent normal tissue using parametric test or nonparametric test (Baldi and Long
2001; Jafari and Azuaje
2006), which mainly rely on the statistical characteristics of gene expression data without any learning algorithm. Therefore, these methods are efficient but have poor performance on large-scale data. (2) Machine learning methods. Although the number of measured genes in gene expression data from DNA microarrays is large, only a few underlying gene components account for tumor type classification and these genes are selected as candidate biomarkers through machine learning methods (Díaz-Uriarte and de Andres
2006; Liu
et al.
2005). However, these methods are unable to detect cancer-specific differentially expressed genes and have limits on multiple datasets. (3) Deep learning methods. Way
et al. (Way and Greene
2018) proposed a variational autoencoder framework and conducted cancer stratification, and specific activated expression patterns by training it on The Cancer Genome Atlas (TCGA) (The Cancer Genome Atlas Research Network
et al.
2013) pan-cancer RNA-seq data. Similarly, Dandee
et al. (Danaee
et al.
2017) used a stacked denoising autoencoder model to extract deep features of high-dimensional gene expression profiles and performed classification on them. Referencing visualization methods, Lyu
et al. (Lyu and Haque
2018) embedded gene expression data into 2-D images and made classification based on a deep convolutional neural network. Khoshghalbvash
et al. (Khoshghalbvash and Gao
2019) constructed an integrative deep neural network to perform classification and feature selection on multi-source genomic data. These deep learning methods show better performance than previous approaches. However, gene expression data is usually high-dimensional and the number of samples for different cancer types is unbalanced. Deep learning framework could gain a good performance on tumor types with a large sample size, as to small samples, the generalization ability becomes weaker. The data volume of the existing single high-quality dataset (such as TCGA) is not enough to fully exert the advantages of deep learning, especially since the sample size of some relatively uncommon tumor types is extremely small. There exist a large number of datasets from different research institutions on the Gene Expression Comprehensive Database (GEO), which are of relatively low quality, with inconsistent internal variances, lack of labels, and other challenges (Barrett
et al.
2007). Exploiting the use of these datasets to improve the performance of deep learning in gene expression analysis is attractive.