Optimizing Chronic Kidney Disease Diagnosis Using the C4.5 Algorithm and Missing Value Imputation Strategies
DOI:
10.29303/jppipa.v11i9.12456Published:
2025-09-25Downloads
Abstract
The occurrence of missing values in data mining is a significant challenge that can hinder the knowledge extraction process. Incomplete data not only reduces efficiency in data management and analysis, but also has the potential to bias decision-making. This study aims to improve the performance of the C4.5 algorithm in dealing with missing value problems through the application of imputation techniques and GridSearchCV optimization. In this study, we propose an approach to handling missing values by combining several imputation methods, including minimum, maximum, mean-mode, median, and k-Nearest Neighbors (k-NN). These methods are applied to the Chronic Kidney Disease dataset obtained from the UCI Repository. After the imputation process, we performed hyperparameter optimization using GridSearchCV to find the best parameter combination for the C4.5 algorithm. Experimental results show that the application of imputation techniques and GridSearchCV optimization significantly improves the classification accuracy of the C4.5 algorithm. The comparison results show that the application of missing value handling, combined with GridSearchCV optimization, successfully improves the accuracy of the model by 2.25% compared to without using missing values. This proves that handling missing values along with proper GridSearchCV optimization can improve the prediction quality of the model.
Keywords:
C4.5 Algorithm GridSearchCV Imputation Missing valueReferences
Ahmed, A. S., & Salah, H. A. (2023). A Comparative Study of Classification Techniques in Data Mining Algorithms Used for Medical Diagnosis Based on DSS. Bulletin of Electrical Engineering and Informatics, 12(5). https://doi.org/10.11591/eei.v12i5.4804 DOI: https://doi.org/10.11591/eei.v12i5.4804
Ariyanti, L., & Alamsyah, A. (2023). C4.5 Algorithm Optimization and Support Vector Machine by Applying Particle Swarm Optimization for Chronic Kidney Disease Diagnosis. Recursive Journal of Informatics, 1(1). https://doi.org/10.15294/rji.v1i1.65196 DOI: https://doi.org/10.15294/rji.v1i1.65196
Cahyani, N., & Muslim, M. (2020). Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-Based Feature Selection for Chronic Kidney Disease Diagnosis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 12(1), 25–32. Retrieved from https://jtec.utem.edu.my/jtec/article/view/4922
Chen, T. K., Knicely, D. H., & Grams, M. E. (2019). Chronic Kidney Disease Diagnosis and Management: A Review. JAMA - Journal of the American Medical Association, 322. https://doi.org/10.1001/jama.2019.14745 DOI: https://doi.org/10.1001/jama.2019.14745
Chung, C. J., Wu, C. H., Hu, W. L., Shih, C. H., Liao, Y. N., & Hung, Y. C. (2023). Tongue Diagnosis Index of Chronic Kidney Disease. Biomedical Journal, 46(1). https://doi.org/10.1016/j.bj.2022.02.001 DOI: https://doi.org/10.1016/j.bj.2022.02.001
Duchessi, P., & Lauría, E. J. M. (2013). Decision Tree Models for Profiling Ski Resorts’ Promotional and Advertising Strategies and the Impact on Sales. Expert Systems with Applications, 40(15), 5822–5829. https://doi.org/10.1016/j.eswa.2013.05.017 DOI: https://doi.org/10.1016/j.eswa.2013.05.017
Fay, K. S., & Cohen, D. L. (2021). Resistant Hypertension in People with CKD: A Review. American Journal of Kidney Diseases, 77. https://doi.org/10.1053/j.ajkd.2020.04.017 DOI: https://doi.org/10.1053/j.ajkd.2020.04.017
Kalamas, A. G., & Niemann, C. U. (2013). Patients with Chronic Kidney Disease. Medical Clinics of North America, 97(6), 1109–1122. https://doi.org/10.1016/j.mcna.2013.07.002 DOI: https://doi.org/10.1016/j.mcna.2013.07.002
Ma, Y., Cai, F., Li, Y., Chen, J., Han, F., & Lin, W. (2020). A Review of the Application of Nanoparticles in the Diagnosis and Treatment of Chronic Kidney Disease. Bioactive Materials, 5. https://doi.org/10.1016/j.bioactmat.2020.05.002 DOI: https://doi.org/10.1016/j.bioactmat.2020.05.002
Meesad, P., & Yen, G. G. (2003). Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis. IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans, 33(2), 206–222. https://doi.org/10.1109/TSMCA.2003.811290 DOI: https://doi.org/10.1109/TSMCA.2003.811290
Ogunleye, A., & Wang, Q. G. (2020). XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6). https://doi.org/10.1109/TCBB.2019.2911071 DOI: https://doi.org/10.1109/TCBB.2019.2911071
Prabowo, A., Wardani, S., Muis, A., Gea, R., & Tarigan, N. A. B. (2024). Diagnosis and Prediction of Chronic Kidney Disease Using a Stacked Generalization Approach. Journal of Computer Networks, Architecture and High Performance Computing, 6(1). https://doi.org/10.47709/cnahpc.v6i1.3611 DOI: https://doi.org/10.47709/cnahpc.v6i1.3611
Prasad, D. V. V., Venkataramana, L., Balasubramanian, P., Priyankha, B., Rajagopal, S., & Dattuluri, R. (2019). An Efficient Pre-Processing Method for Improved Classification of Diabetics Using Decision Tree and Artificial Neural Network. AIP Conference Proceedings, 2161. https://doi.org/10.1063/1.5127648 DOI: https://doi.org/10.1063/1.5127648
Purwar, A., & Singh, S. K. (2015). Hybrid Prediction Model with Missing Value Imputation for Medical Data. Expert Systems with Applications, 42(13), 5621–5631. https://doi.org/10.1016/j.eswa.2015.02.050 DOI: https://doi.org/10.1016/j.eswa.2015.02.050
Sahin, Y., Bulkan, S., & Duman, E. (2013). A Cost-Sensitive Decision Tree Approach for Fraud Detection. Expert Systems with Applications, 40(15), 5916–5923. https://doi.org/10.1016/j.eswa.2013.05.021 DOI: https://doi.org/10.1016/j.eswa.2013.05.021
Setsirichok, D., Piroonratana, T., Wongseree, W., Usavanarong, T., Paulkhaolarn, N., Kanjanakorn, C., … Chaiyaratana, N. (2012). Classification of Complete Blood Count and Haemoglobin Typing Data by a C4.5 Decision Tree, a Naive Bayes Classifier and a Multilayer Perceptron for Thalassaemia Screening. Biomedical Signal Processing and Control, 7(2), 202–212. https://doi.org/10.1016/j.bspc.2011.03.007 DOI: https://doi.org/10.1016/j.bspc.2011.03.007
Shanmugarajeshwari, V., & Ilayaraja, M. (2023). Intelligent Decision Support for Identifying Chronic Kidney Disease Stages: Machine Learning Algorithms. International Journal of Intelligent Information Technologies, 20(1). https://doi.org/10.4018/IJIIT.334557 DOI: https://doi.org/10.4018/IJIIT.334557
Simões e Silva, A. C., Oliveira, E. A., & Mak, R. H. (2020). Urinary Tract Infection in Pediatrics: An Overview. Jornal de Pediatria, 96. https://doi.org/10.1016/j.jped.2019.10.006 DOI: https://doi.org/10.1016/j.jped.2019.10.006
Surís, X., Vela, E., Clèries, M., Pueyo-Sánchez, M. J., Llargués, E., & Larrosa, M. (2022). Epidemiology of Major Osteoporotic Fractures: A Population-Based Analysis in Catalonia, Spain. Archives of Osteoporosis, 17(1). https://doi.org/10.1007/s11657-022-01081-1 DOI: https://doi.org/10.1007/s11657-022-01081-1
Wati, M., Pakpahan, H. S., Prafanto, A., Akbar, F., Haviluddin, H., & Boernama, A. W. D. (2019). Application of C4.5 Classification Algorithm for Chronic Kidney Disease Diagnosis. ICEEIE 2019 - International Conference on Electrical, Electronics and Information Engineering: Emerging Innovative Technology for Sustainable Future. https://doi.org/10.1109/ICEEIE47180.2019.8981458 DOI: https://doi.org/10.1109/ICEEIE47180.2019.8981458
License
Copyright (c) 2025 Ahmad Riyanto, Purwanto, Farrikh Al Zami, Ridodio Andreuw Meda

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






