Optimized Stacking Ensemble Classifier for Early Cancer Detection Using Biomarker Data
DOI:
https://doi.org/10.26877/asset.v6i4.986Keywords:
Cancer Detection, Ensemble Methods, Biomarkers, Hyperparameter Optimization, Machine Learning Optimization, Particle Swam Optimization, Stacking EnsembleAbstract
Ovarian cancer ranks sixth globally as a major cause of death among women, with a five-year survival rate below 50%, largely due to late detection. Early detection is crucial to lower mortality rates. This paper introduces an Optimized Stacking Ensemble Classifier (OSEC) for early ovarian cancer detection using biomarkers. The model comprises two layers: the first layer includes base classifiers optimized with Particle Swarm Optimization (PSO), while the second layer is a meta-classifier integrating Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest(RF) models fine-tuned through grid search. Among the three datasets evaluated, the Blood Routine dataset showed the best performance with a stacked RF meta-classifier, achieving: 94.29% accuracy. The Stacked RF model also outperformed others, reaching 92.82% accuracy on the Serum dataset and 92.77% on the Malignant Ovarian Tumor (MOT) dataset, consistently excelling in precision, recall, and f1-score.
References
Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018, 68, 394–424.
Torre, L.A.; Trabert, B.; DeSantis, C.E.; Miller, K.D.; Samimi, G.; Runowicz, C.D.; Siegel, R.L. Ovarian cancer statistics, 2018. CA Cancer J. Clin. 2018, 68, 284–296.
Marchetti, C.; Pisano, C.; Facchini, G.; Bruni, G.S.; Magazzino, F.P.; Losito, S.; Pignata, S. First-line treatment of advanced ovarian cancer: Current research and perspectives. Expert Rev. Anticancer Ther. 2010, 10, 47–60.
Wang, J.; Gao, J.; Yao, H.; Wu, Z.; Wang, M.; Qi, J. Diagnostic Accuracy(L) of serum HE4, CA125 and ROMA in patients with ovarian cancer: A meta-analysis. Tumor Biol. 2014, 35, 6127–6138. [CrossRef]
K. Jegadeeswari, R. Rathipriya and J. Renugadevi, "Fusion Learning of Regression Models for Missing Data Imputation in Breast Cancer Dataset," 2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI), Raipur, India, 2023, pp. 1-14, doi: 10.1109/ICAIIHI57871.2023.10489656.
Bast, R.C., Hennessy, B., & Mills, G.B. (2009). The biology of ovarian cancer: new opportunities for translation. Nature Reviews Cancer, 9(6), 415-428.
Medeiros, L.R., Rosa, D.D., da Rosa, M.I., & Bozzetti, M.C. (2009). Accuracy(L) of CA 125 in the diagnosis of ovarian tumors: a quantitative systematic review. European Journal of Obstetrics & Gynecology and Reproductive Biology, 142(2), 99-105.
Moore, R.G., Miller, M.C., Disilvestro, P., Landrum, L.M., Gajewski, W., Ball, J.J., & Skates, S.J. (2012). Evaluation of the diagnostic Accuracy(L) of the Risk of Ovarian Malignancy Algorithm in women with a pelvic mass. Obstetrics and Gynecology, 118(2), 280-288.
Kinkel, K., Hricak, H., Lu, Y., Tsuda, K., & Filly, R.A. (2005). US characterization of ovarian masses: a meta-analysis. Radiology, 217(3), 803-811.
Bookman, M.A. (2004). Standard treatment in advanced ovarian cancer in 2005: the state of the art. International Journal of Gynecological Cancer, 15(3), 212-220.
Cruz, J.A., & Wishart, D.S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2, 59-77.
Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN'95 - International Conference on Neural Networks, 4, 1942-1948.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.
Lu, M.; Fan, Z.; Xu, B.; Chen, L.; Zheng, X.; Li, J.; Znati, T.; Mi, Q.; Jiang, J. Using machine learning to predict ovarian cancer. Int. J. Med. Inform. 2020, 141, 104195.
M. T. Anwar, Automatic Complaints Categorization Using Random Forest and Gradient Boosting, Advance Sustainable Science, Engineering and Technology, vol. 3, no. 1, p. 0210106, Apr. (2021), doi: 10.26877/asset.v3i1.8460.
Darmawan, R. A., & Hidayat, E. Y. Comparison of gradient boosting and random forest models in the detection system of Rakaat during prayer. Advance Sustainable Science, Engineering and Technology, (2024),6(1), 02401015. https://doi.org/10.26877/asset.v6i1.17886.