Machine learning approach to identify significant genes and classify cancer types from RNA-seq data

Sultana Akter , Ridwan Olamilekan Adesola , Shreya Basnet

Global Medical Genetics ›› 2025, Vol. 12 ›› Issue (04) : 100079

PDF (10455KB)
Global Medical Genetics ›› 2025, Vol. 12 ›› Issue (04) :100079
Research article
research-article

Machine learning approach to identify significant genes and classify cancer types from RNA-seq data

Author information +
History +
PDF (10455KB)

Abstract

Cancer remains a leading cause of morbidity and mortality worldwide, with nearly 10 million deaths reported in 2022. In the United States, more than 618,000 deaths are projected to occur in 2025. Traditional methods for identifying cancer types are often time-consuming, labor-intensive, and resource-demanding, highlighting the need for efficient alternatives. This study aimed to evaluate machine learning algorithms on RNA-seq gene expression data to identify statistically significant genes and classify cancer types. We retrieved the PANCAN RNA-seq dataset from the UCI Machine Learning Repository and assessed eight classifiers—Support Vector Machines, K-Nearest Neighbors, AdaBoost, Random Forest, Decision Tree, Quadratic Discriminant Analysis, Naïve Bayes, and Artificial Neural Networks. Model performance was validated using a 70/30 train-test split and 5-fold cross-validation. Among the tested models, the Support Vector Machine achieved the highest classification accuracy of 99.87 % under 5-fold cross-validation. These findings demonstrate the potential of machine learning to efficiently analyze RNA-seq data, facilitate biomarker discovery, and support the development of personalized cancer diagnostics and treatment strategies.

Keywords

Cancer / Diagnosis / Machine learning / RNA seq

Cite this article

Download citation ▾
Sultana Akter, Ridwan Olamilekan Adesola, Shreya Basnet. Machine learning approach to identify significant genes and classify cancer types from RNA-seq data. Global Medical Genetics, 2025, 12(04): 100079 DOI:

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

SA, ROA, and SB conceptualized the idea; SA performed the analysis; SA, ROA, and SB wrote, reviewed, and edited the initial and final draft. All authors agreed on the final draft to be submitted.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Data availability

Declaration of Competing Interest

The authors declare that they have no financial or personal relationship(s) that may have inappropriately influenced them in writing this article.

Acknowledgements

We thank Prof Toni Kazic for her support during the project, supervising and reviewing the project. This manuscript was adapted from our computational genomics final class project.

References

[1]

National Cancer Institute. (2024). Cancer statistics. Available at: 〈 https://www.cancer.gov/about-cancer/understanding/statistics#:∼:text=Cancer%20is %20among%20the%20leading,million%20cancer%2Drelated%20deaths%20worldwide〉.

[2]

World Health Organization. (2024). Available at: Global cancer burden growing, amidst mounting need for services. https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing-amidst-mounting-need-for-services〉.

[3]

J.T. Loud, J. Murphy, Cancer screening and early detection in the 21st century, Semin. Oncol. Nurs. 33 (2) (2017) 121-128, https://doi.org/10.1016/j.soncn.2017.02.002.

[4]

M. Shahzad, M. Rafi, W. Alhalabi, N. Minaz Ali, M.S. Anwar, S. Jamal, M. Barket Ali F.A. Alqurashi, Classification of clinically actionable genetic mutations in cancer patients, Front. Mol. Biosci. 10 (2024) 1277862, https://doi.org/10.3389/fmolb.2023.1277862.

[5]

A.T. Aborode, O.A. Emmanuel, I.A. Onifade, E. Olotu, O.J. Otorkpa, Q. Mehmood, R.O. Adesola, The role of machine learning in discovering biomarkers and predicting treatment strategies for neurodegenerative diseases: a narrative review, NeuroMarkers 2 (1) (2025) 100034.

[6]

H.H. Rashidi, J. Pantanowitz, M.G. Hanna, A.P. Tafti, P. Sanghani, A. Buchinsky, B. Fennell, M. Deebajah, S. Wheeler, T. Pearce, I. Abukhiran, S. Robertson, O. Palmer, M. Gur, N.K. Tran, L. Pantanowitz, Introduction to artificial intelligence and machine learning in pathology and medicine:generative and non-generative artificial intelligence basics, Mod. Pathol. Off. J. U. S. Can. Acad. Pathol. Inc. 38 (4) (2025) 100688, https://doi.org/10.1016/j.modpat.2024.100688.

[7]

J.A. Cruz, D.S. Wishart, Applications of machine learning in cancer prediction and prognosis, Cancer Inform. 2 (2007) 59-77.

[8]

UCI Machine Learning Repository. (n. d.). Gene Expression Cancer RNA-Seq dataset.

[9]

J. Wu, C. Hicks, Breast cancer type classification using machine learning, J. Pers. Med. 11 (2) (2021) 61, https://doi.org/10.3390/jpm11020061.

[10]

M.F. Akay, Support vector machines combined with feature selection for breast cancer diagnosis, Expert Syst. Appl. 36 (2) (2009) 3240-3247.

PDF (10455KB)

12

Accesses

0

Citation

Detail

Sections
Recommended

/