Optimizing Stacked KNN, Naive Bayes, and LDA Models Using Random Forest as a Meta-Learner for Diabetes Classification
DOI:
10.29303/jppipa.v11i10.12546Published:
2025-10-25Downloads
Abstract
Diabetes is one of the chronic diseases with a high mortality rate that requires proper treatment and early detection. This study proposes a stacking model approach with a combination of K-Nearest Neighbor (KNN), Naive Bayes, and Linear Discriminant Analysis (LDA) as the base-learner, and Random Forest as the meta-learner. The main objective of this study is to improve the classification accuracy of diabetes datasets that have an unbalanced class distribution. The experiment was conducted on the Pima Indians Diabetes dataset from the UCI Machine Learning Repository. The test results showed that the proposed stacking model was able to achieve an accuracy of 96.30%, True Positive Rate (TPR) of 88.89%, True Negative Rate (TNR) of 100%, and G-Mean of 94.28%. This performance is significantly better than the previous single classifier model and stacking approach. Thus, the proposed stacking model can be used as an effective solution in the classification of diabetic diseases under conditions of unbalanced class distribution.
Keywords:
Classification, Diabetes, Imbalanced dataset, LDA, Random forestReferences
Abdan, M., & Seno, S. A. H. (2022). Machine Learning Methods for Intrusive Detection of Wormhole Attack in Mobile Ad Hoc Network (MANET). Wireless Communications and Mobile Computing, 2022. https://doi.org/10.1155/2022/2375702
Ahmed, H., Younis, E. M. G., & Ali, A. A. (2020). Predicting Diabetes using Distributed Machine Learning based on Apache Spark. Proceedings of 2020 International Conference on Innovative Trends in Communication and Computer Engineering, ITCE 2020, 44–49. https://doi.org/10.1109/ITCE48509.2020.9047795
Ellyzabeth Sukmawati, Iwan Adhicandra, & Nur Sucahyo. (2022). Information System Design of Online-Based Technology News Forum. International Journal Of Artificial Intelligence Research, 1.2. https://doi.org/https://doi.org/10.29099/ijair.v6i1.2.593
Hemachandran, K., Verma, P., Pareek, P., Arora, N., Rajesh Kumar, K. V., Ahanger, T. A., Pise, A. A., & Ratna, R. (2022). Artificial Intelligence: A Universal Virtual Tool to Augment Tutoring in Higher Education. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/1410448
Hsiung, S. Y., Deng, S. X., Li, J., Huang, S. Y., Liaw, C. K., Huang, S. Y., Wang, C. C., & Hsieh, Y. S. Y. (2023). Machine learning-based monosaccharide profiling for tissue-specific classification of Wolfiporia extensa samples. Carbohydrate Polymers, 322. https://doi.org/10.1016/j.carbpol.2023.121338
International Diabetes Federation. (2021). IDF Diabetes Atlas 2021 _ IDF Diabetes Atlas. IDF Official Website, 1–4.
Khan, A., Khan, A., Khan, M. M., Farid, K., Alam, M. M., & Su’ud, M. B. M. (2022). Cardiovascular and Diabetes Diseases Classification Using Ensemble Stacking Classifiers with SVM as a Meta Classifier. Diagnostics, 12(11). https://doi.org/10.3390/diagnostics12112595
Lopatka, V., Meniailov, I., & Bazilevych, K. (2021). Classification and Prediction of Diabetes Disease Using Modified k-neighbors Method. 2021 IEEE 12th International Conference on Electronics and Information Technologies, ELIT 2021 - Proceedings, 46–50. https://doi.org/10.1109/ELIT53502.2021.9501151
Machaka, R. (2021). Machine learning-based prediction of phases in high-entropy alloys. Computational Materials Science, 188. https://doi.org/10.1016/j.commatsci.2020.110244
Mafarja, M., Thaher, T., Al-Betar, M. A., Too, J., Awadallah, M. A., Abu Doush, I., & Turabieh, H. (2023). Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. Applied Intelligence, 53(15). https://doi.org/10.1007/s10489-022-04427-x
Maretalinia, Rusmitasari, H., Supriatin, Amaliah, L., Sukmawati, E., & Suwarni, L. (2023). Factors influencing the utilization of the Modern Family Planning (MFP) method under the National Health Insurance in Indonesia: An analysis of the 2017 IDHS. Public Health of Indonesia, 9(2). https://doi.org/10.36685/phi.v9i2.694
Mohd Amram, N. A. L., Keikhosrokiani, P., & Asl, M. P. (2023). Artificial intelligence approach for detection and classification of depression among refugees in selected diasporic novels. Social Sciences and Humanities Open, 8(1). https://doi.org/10.1016/j.ssaho.2023.100558
Nipa, N., Riyad, M. H., Satu, S., Walliullah, Howlader, K. C., & Moni, M. A. (2024). Clinically adaptable machine learning model to identify early appreciable features of diabetes. Intelligent Medicine, 4(1). https://doi.org/10.1016/j.imed.2023.01.003
Nurmasani, A., & Pristyanto, Y. (2021). Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class. Pseudocode, 8(1), 21–26. https://doi.org/10.33369/pseudocode.8.1.21-26
Orchi, H., Sadik, M., Khaldoun, M., & Sabir, E. (2023). Automation of Crop Disease Detection through Conventional Machine Learning and Deep Transfer Learning Approaches. Agriculture (Switzerland), 13(2). https://doi.org/10.3390/agriculture13020352
Park, K., & Song, Y. (2022). Multimodal Diabetes Empowerment for Older Adults with Diabetes. International Journal of Environmental Research and Public Health, 19(18). https://doi.org/10.3390/ijerph191811299
Perkeni. (2020). Tetap Produktif, Cegah Dan Atasi Diabetes Mellitus. Pusat Data Dan Informasi Kementrian Kesehatan RI.
Rousyati, R., Rais, A. N., Rahmawati, E., & Amir, R. F. (2021). Prediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging. EVOLUSI : Jurnal Sains Dan Manajemen, 9(2). https://doi.org/10.31294/evolusi.v9i2.11159
Singh, G., Pal, Y., & Dahiya, A. K. (2023). Classification of Power Quality Disturbances using Linear Discriminant Analysis. Applied Soft Computing, 138. https://doi.org/10.1016/j.asoc.2023.110181
Siti Khotimatul Wildah, Agustiani, S., Ali Mustopa, Nanik Wuryani, Hendri Mahmud Nawawi, & Rizky Ade Safitri. (2021). Pengenalan Wajah Menggunakan Pembelajaran Mesin Berdasarkan Ekstraksi Fitur Pada Gambar Wajah Berkualitas Rendah. INFOTECH : Jurnal Informatika & Teknologi, 2(2). https://doi.org/10.37373/infotech.v2i2.189
Sukmawati, E., Didik, N., & Imanah, N. (2024). Motivation for Pregnant Women to Get Covid-19 Vaccination ARTICLE INFO. https://doi.org/10.33860/jbc.v6i4.3978
Tripathi, G., & Kumar, R. (2020). Early Prediction of Diabetes Mellitus Using Machine Learning. ICRITO 2020 - IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), 1009–1014. https://doi.org/10.1109/ICRITO48877.2020.9197832
Żabiński, G., Gramacki, J., Gramacki, A., Miśta-Jakubowska, E., Birch, T., & Disser, A. (2020). Multi-classifier majority voting analyses in provenance studies on iron artefacts. Journal of Archaeological Science, 113. https://doi.org/10.1016/j.jas.2019.105055
License
Copyright (c) 2025 Ridodio Andreuw Meda, Purwanto Purwanto, Farrikh Al Zami, Ahmad Riyanto

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






