Vol. 11 No. 9 (2025): September
Open Access
Peer Reviewed

Optimizing Chronic Kidney Disease Diagnosis Using the C4.5 Algorithm and Missing Value Imputation Strategies

Authors

Ahmad Riyanto , Purwanto , Farrikh Al Zami , Ridodio Andreuw Meda

DOI:

10.29303/jppipa.v11i9.12456

Published:

2025-09-25

Downloads

Abstract

The occurrence of missing values in data mining is a significant challenge that can hinder the knowledge extraction process. Incomplete data not only reduces efficiency in data management and analysis, but also has the potential to bias decision-making. This study aims to improve the performance of the C4.5 algorithm in dealing with missing value problems through the application of imputation techniques and GridSearchCV optimization. In this study, we propose an approach to handling missing values by combining several imputation methods, including minimum, maximum, mean-mode, median, and k-Nearest Neighbors (k-NN). These methods are applied to the Chronic Kidney Disease dataset obtained from the UCI Repository. After the imputation process, we performed hyperparameter optimization using GridSearchCV to find the best parameter combination for the C4.5 algorithm. Experimental results show that the application of imputation techniques and GridSearchCV optimization significantly improves the classification accuracy of the C4.5 algorithm. The comparison results show that the application of missing value handling, combined with GridSearchCV optimization, successfully improves the accuracy of the model by 2.25% compared to without using missing values. This proves that handling missing values along with proper GridSearchCV optimization can improve the prediction quality of the model.

Keywords:

C4.5 Algorithm GridSearchCV Imputation Missing value

References

Ahmed, A. S., & Salah, H. A. (2023). A Comparative Study of Classification Techniques in Data Mining Algorithms Used for Medical Diagnosis Based on DSS. Bulletin of Electrical Engineering and Informatics, 12(5). https://doi.org/10.11591/eei.v12i5.4804 DOI: https://doi.org/10.11591/eei.v12i5.4804

Ariyanti, L., & Alamsyah, A. (2023). C4.5 Algorithm Optimization and Support Vector Machine by Applying Particle Swarm Optimization for Chronic Kidney Disease Diagnosis. Recursive Journal of Informatics, 1(1). https://doi.org/10.15294/rji.v1i1.65196 DOI: https://doi.org/10.15294/rji.v1i1.65196

Cahyani, N., & Muslim, M. (2020). Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-Based Feature Selection for Chronic Kidney Disease Diagnosis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 12(1), 25–32. Retrieved from https://jtec.utem.edu.my/jtec/article/view/4922

Chen, T. K., Knicely, D. H., & Grams, M. E. (2019). Chronic Kidney Disease Diagnosis and Management: A Review. JAMA - Journal of the American Medical Association, 322. https://doi.org/10.1001/jama.2019.14745 DOI: https://doi.org/10.1001/jama.2019.14745

Chung, C. J., Wu, C. H., Hu, W. L., Shih, C. H., Liao, Y. N., & Hung, Y. C. (2023). Tongue Diagnosis Index of Chronic Kidney Disease. Biomedical Journal, 46(1). https://doi.org/10.1016/j.bj.2022.02.001 DOI: https://doi.org/10.1016/j.bj.2022.02.001

Duchessi, P., & Lauría, E. J. M. (2013). Decision Tree Models for Profiling Ski Resorts’ Promotional and Advertising Strategies and the Impact on Sales. Expert Systems with Applications, 40(15), 5822–5829. https://doi.org/10.1016/j.eswa.2013.05.017 DOI: https://doi.org/10.1016/j.eswa.2013.05.017

Fay, K. S., & Cohen, D. L. (2021). Resistant Hypertension in People with CKD: A Review. American Journal of Kidney Diseases, 77. https://doi.org/10.1053/j.ajkd.2020.04.017 DOI: https://doi.org/10.1053/j.ajkd.2020.04.017

Kalamas, A. G., & Niemann, C. U. (2013). Patients with Chronic Kidney Disease. Medical Clinics of North America, 97(6), 1109–1122. https://doi.org/10.1016/j.mcna.2013.07.002 DOI: https://doi.org/10.1016/j.mcna.2013.07.002

Ma, Y., Cai, F., Li, Y., Chen, J., Han, F., & Lin, W. (2020). A Review of the Application of Nanoparticles in the Diagnosis and Treatment of Chronic Kidney Disease. Bioactive Materials, 5. https://doi.org/10.1016/j.bioactmat.2020.05.002 DOI: https://doi.org/10.1016/j.bioactmat.2020.05.002

Meesad, P., & Yen, G. G. (2003). Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis. IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans, 33(2), 206–222. https://doi.org/10.1109/TSMCA.2003.811290 DOI: https://doi.org/10.1109/TSMCA.2003.811290

Ogunleye, A., & Wang, Q. G. (2020). XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6). https://doi.org/10.1109/TCBB.2019.2911071 DOI: https://doi.org/10.1109/TCBB.2019.2911071

Prabowo, A., Wardani, S., Muis, A., Gea, R., & Tarigan, N. A. B. (2024). Diagnosis and Prediction of Chronic Kidney Disease Using a Stacked Generalization Approach. Journal of Computer Networks, Architecture and High Performance Computing, 6(1). https://doi.org/10.47709/cnahpc.v6i1.3611 DOI: https://doi.org/10.47709/cnahpc.v6i1.3611

Prasad, D. V. V., Venkataramana, L., Balasubramanian, P., Priyankha, B., Rajagopal, S., & Dattuluri, R. (2019). An Efficient Pre-Processing Method for Improved Classification of Diabetics Using Decision Tree and Artificial Neural Network. AIP Conference Proceedings, 2161. https://doi.org/10.1063/1.5127648 DOI: https://doi.org/10.1063/1.5127648

Purwar, A., & Singh, S. K. (2015). Hybrid Prediction Model with Missing Value Imputation for Medical Data. Expert Systems with Applications, 42(13), 5621–5631. https://doi.org/10.1016/j.eswa.2015.02.050 DOI: https://doi.org/10.1016/j.eswa.2015.02.050

Sahin, Y., Bulkan, S., & Duman, E. (2013). A Cost-Sensitive Decision Tree Approach for Fraud Detection. Expert Systems with Applications, 40(15), 5916–5923. https://doi.org/10.1016/j.eswa.2013.05.021 DOI: https://doi.org/10.1016/j.eswa.2013.05.021

Setsirichok, D., Piroonratana, T., Wongseree, W., Usavanarong, T., Paulkhaolarn, N., Kanjanakorn, C., … Chaiyaratana, N. (2012). Classification of Complete Blood Count and Haemoglobin Typing Data by a C4.5 Decision Tree, a Naive Bayes Classifier and a Multilayer Perceptron for Thalassaemia Screening. Biomedical Signal Processing and Control, 7(2), 202–212. https://doi.org/10.1016/j.bspc.2011.03.007 DOI: https://doi.org/10.1016/j.bspc.2011.03.007

Shanmugarajeshwari, V., & Ilayaraja, M. (2023). Intelligent Decision Support for Identifying Chronic Kidney Disease Stages: Machine Learning Algorithms. International Journal of Intelligent Information Technologies, 20(1). https://doi.org/10.4018/IJIIT.334557 DOI: https://doi.org/10.4018/IJIIT.334557

Simões e Silva, A. C., Oliveira, E. A., & Mak, R. H. (2020). Urinary Tract Infection in Pediatrics: An Overview. Jornal de Pediatria, 96. https://doi.org/10.1016/j.jped.2019.10.006 DOI: https://doi.org/10.1016/j.jped.2019.10.006

Surís, X., Vela, E., Clèries, M., Pueyo-Sánchez, M. J., Llargués, E., & Larrosa, M. (2022). Epidemiology of Major Osteoporotic Fractures: A Population-Based Analysis in Catalonia, Spain. Archives of Osteoporosis, 17(1). https://doi.org/10.1007/s11657-022-01081-1 DOI: https://doi.org/10.1007/s11657-022-01081-1

Wati, M., Pakpahan, H. S., Prafanto, A., Akbar, F., Haviluddin, H., & Boernama, A. W. D. (2019). Application of C4.5 Classification Algorithm for Chronic Kidney Disease Diagnosis. ICEEIE 2019 - International Conference on Electrical, Electronics and Information Engineering: Emerging Innovative Technology for Sustainable Future. https://doi.org/10.1109/ICEEIE47180.2019.8981458 DOI: https://doi.org/10.1109/ICEEIE47180.2019.8981458

Author Biographies

Ahmad Riyanto, Universitas Dian Nuswantoro (UDINUS)

Author Origin : Indonesia

Purwanto, Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Author Origin : Indonesia

Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Farrikh Al Zami, Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Author Origin : Indonesia

Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Ridodio Andreuw Meda, Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Author Origin : Indonesia

Faculty of Computer Science, Master of Informatics Engineering (MTI), Universitas Dian Nuswantoro (UDINUS)

Downloads

Download data is not yet available.

How to Cite

Riyanto, A., Purwanto, P., Al Zami, F., & Andreuw Meda, R. (2025). Optimizing Chronic Kidney Disease Diagnosis Using the C4.5 Algorithm and Missing Value Imputation Strategies. Jurnal Penelitian Pendidikan IPA, 11(9), 857–863. https://doi.org/10.29303/jppipa.v11i9.12456