Optimizing Stacked KNN, Naive Bayes, and LDA Models Using Random Forest as a Meta-Learner for Diabetes Classification

Authors

Ridodio Andreuw Meda , Purwanto Purwanto , Farrikh Al Zami , Ahmad Riyanto

DOI:

10.29303/jppipa.v11i10.12546

Published:

2025-10-25

Downloads

Abstract

Diabetes is one of the chronic diseases with a high mortality rate that requires proper treatment and early detection. This study proposes a stacking model approach with a combination of K-Nearest Neighbor (KNN), Naive Bayes, and Linear Discriminant Analysis (LDA) as the base-learner, and Random Forest as the meta-learner. The main objective of this study is to improve the classification accuracy of diabetes datasets that have an unbalanced class distribution. The experiment was conducted on the Pima Indians Diabetes dataset from the UCI Machine Learning Repository. The test results showed that the proposed stacking model was able to achieve an accuracy of 96.30%, True Positive Rate (TPR) of 88.89%, True Negative Rate (TNR) of 100%, and G-Mean of 94.28%. This performance is significantly better than the previous single classifier model and stacking approach. Thus, the proposed stacking model can be used as an effective solution in the classification of diabetic diseases under conditions of unbalanced class distribution.

Keywords:

Classification, Diabetes, Imbalanced dataset, LDA, Random forest

References

Abdan, M., & Seno, S. A. H. (2022). Machine Learning Methods for Intrusive Detection of Wormhole Attack in Mobile Ad Hoc Network (MANET). Wireless Communications and Mobile Computing, 2022. https://doi.org/10.1155/2022/2375702

Ahmed, H., Younis, E. M. G., & Ali, A. A. (2020). Predicting Diabetes using Distributed Machine Learning based on Apache Spark. Proceedings of 2020 International Conference on Innovative Trends in Communication and Computer Engineering, ITCE 2020, 44–49. https://doi.org/10.1109/ITCE48509.2020.9047795

Ellyzabeth Sukmawati, Iwan Adhicandra, & Nur Sucahyo. (2022). Information System Design of Online-Based Technology News Forum. International Journal Of Artificial Intelligence Research, 1.2. https://doi.org/https://doi.org/10.29099/ijair.v6i1.2.593

Hemachandran, K., Verma, P., Pareek, P., Arora, N., Rajesh Kumar, K. V., Ahanger, T. A., Pise, A. A., & Ratna, R. (2022). Artificial Intelligence: A Universal Virtual Tool to Augment Tutoring in Higher Education. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/1410448

Hsiung, S. Y., Deng, S. X., Li, J., Huang, S. Y., Liaw, C. K., Huang, S. Y., Wang, C. C., & Hsieh, Y. S. Y. (2023). Machine learning-based monosaccharide profiling for tissue-specific classification of Wolfiporia extensa samples. Carbohydrate Polymers, 322. https://doi.org/10.1016/j.carbpol.2023.121338

International Diabetes Federation. (2021). IDF Diabetes Atlas 2021 _ IDF Diabetes Atlas. IDF Official Website, 1–4.

Khan, A., Khan, A., Khan, M. M., Farid, K., Alam, M. M., & Su’ud, M. B. M. (2022). Cardiovascular and Diabetes Diseases Classification Using Ensemble Stacking Classifiers with SVM as a Meta Classifier. Diagnostics, 12(11). https://doi.org/10.3390/diagnostics12112595

Lopatka, V., Meniailov, I., & Bazilevych, K. (2021). Classification and Prediction of Diabetes Disease Using Modified k-neighbors Method. 2021 IEEE 12th International Conference on Electronics and Information Technologies, ELIT 2021 - Proceedings, 46–50. https://doi.org/10.1109/ELIT53502.2021.9501151

Machaka, R. (2021). Machine learning-based prediction of phases in high-entropy alloys. Computational Materials Science, 188. https://doi.org/10.1016/j.commatsci.2020.110244

Mafarja, M., Thaher, T., Al-Betar, M. A., Too, J., Awadallah, M. A., Abu Doush, I., & Turabieh, H. (2023). Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. Applied Intelligence, 53(15). https://doi.org/10.1007/s10489-022-04427-x

Maretalinia, Rusmitasari, H., Supriatin, Amaliah, L., Sukmawati, E., & Suwarni, L. (2023). Factors influencing the utilization of the Modern Family Planning (MFP) method under the National Health Insurance in Indonesia: An analysis of the 2017 IDHS. Public Health of Indonesia, 9(2). https://doi.org/10.36685/phi.v9i2.694

Mohd Amram, N. A. L., Keikhosrokiani, P., & Asl, M. P. (2023). Artificial intelligence approach for detection and classification of depression among refugees in selected diasporic novels. Social Sciences and Humanities Open, 8(1). https://doi.org/10.1016/j.ssaho.2023.100558

Nipa, N., Riyad, M. H., Satu, S., Walliullah, Howlader, K. C., & Moni, M. A. (2024). Clinically adaptable machine learning model to identify early appreciable features of diabetes. Intelligent Medicine, 4(1). https://doi.org/10.1016/j.imed.2023.01.003

Nurmasani, A., & Pristyanto, Y. (2021). Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class. Pseudocode, 8(1), 21–26. https://doi.org/10.33369/pseudocode.8.1.21-26

Orchi, H., Sadik, M., Khaldoun, M., & Sabir, E. (2023). Automation of Crop Disease Detection through Conventional Machine Learning and Deep Transfer Learning Approaches. Agriculture (Switzerland), 13(2). https://doi.org/10.3390/agriculture13020352

Park, K., & Song, Y. (2022). Multimodal Diabetes Empowerment for Older Adults with Diabetes. International Journal of Environmental Research and Public Health, 19(18). https://doi.org/10.3390/ijerph191811299

Perkeni. (2020). Tetap Produktif, Cegah Dan Atasi Diabetes Mellitus. Pusat Data Dan Informasi Kementrian Kesehatan RI.

Rousyati, R., Rais, A. N., Rahmawati, E., & Amir, R. F. (2021). Prediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging. EVOLUSI : Jurnal Sains Dan Manajemen, 9(2). https://doi.org/10.31294/evolusi.v9i2.11159

Singh, G., Pal, Y., & Dahiya, A. K. (2023). Classification of Power Quality Disturbances using Linear Discriminant Analysis. Applied Soft Computing, 138. https://doi.org/10.1016/j.asoc.2023.110181

Siti Khotimatul Wildah, Agustiani, S., Ali Mustopa, Nanik Wuryani, Hendri Mahmud Nawawi, & Rizky Ade Safitri. (2021). Pengenalan Wajah Menggunakan Pembelajaran Mesin Berdasarkan Ekstraksi Fitur Pada Gambar Wajah Berkualitas Rendah. INFOTECH : Jurnal Informatika & Teknologi, 2(2). https://doi.org/10.37373/infotech.v2i2.189

Sukmawati, E., Didik, N., & Imanah, N. (2024). Motivation for Pregnant Women to Get Covid-19 Vaccination ARTICLE INFO. https://doi.org/10.33860/jbc.v6i4.3978

Tripathi, G., & Kumar, R. (2020). Early Prediction of Diabetes Mellitus Using Machine Learning. ICRITO 2020 - IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), 1009–1014. https://doi.org/10.1109/ICRITO48877.2020.9197832

Żabiński, G., Gramacki, J., Gramacki, A., Miśta-Jakubowska, E., Birch, T., & Disser, A. (2020). Multi-classifier majority voting analyses in provenance studies on iron artefacts. Journal of Archaeological Science, 113. https://doi.org/10.1016/j.jas.2019.105055

Author Biographies

Ridodio Andreuw Meda, Universitas Dian Nuswantoro

Purwanto Purwanto, Faculty of Computer Science, Master of Informatics Engineering, Universitas Dian Nuswantoro, Semarang, Indonesia.

Farrikh Al Zami, Faculty of Computer Science, Master of Informatics Engineering, Universitas Dian Nuswantoro, Semarang, Indonesia.

Faculty of Computer Science, Master of Informatics Engineering, Universitas Dian Nuswantoro, Semarang, Indonesia.

Ahmad Riyanto, Faculty of Computer Science, Master of Informatics Engineering, Universitas Dian Nuswantoro, Semarang, Indonesia.

Faculty of Computer Science, Master of Informatics Engineering, Universitas Dian Nuswantoro, Semarang, Indonesia.

Downloads

Download data is not yet available.

How to Cite

Meda, R. A., Purwanto, P., Zami, F. A., & Riyanto, A. (2025). Optimizing Stacked KNN, Naive Bayes, and LDA Models Using Random Forest as a Meta-Learner for Diabetes Classification. Jurnal Penelitian Pendidikan IPA, 11(10), 418–424. https://doi.org/10.29303/jppipa.v11i10.12546