Prediction of Graduation Accuracy Using the K-Means Clustering Algorithm and Classification Decision Tree


Sri Rahmawati , Sarjon Defit






Vol. 10 No. 4 (2024): April


Centroid, Clustering, Decision tree K-Means, Random

Research Articles


How to Cite

Rahmawati, S., & Defit, S. (2024). Prediction of Graduation Accuracy Using the K-Means Clustering Algorithm and Classification Decision Tree. Jurnal Penelitian Pendidikan IPA, 10(4), 2007–2013.


Download data is not yet available.


Metrics Loading ...


Becoming a scholar at the right time for students is a very meaningful award for them if it is supported by seriousness and perseverance in their studies. Here, sample data was taken from 131 randomly taken in testing. Where there are still students who are not detected by the study program in completing their lectures, so research is carried out on clustering and classification with decision trees in determining the level of accuracy of lectures by clustering data, determining the initial centroid value and the centroid point. The results found were that there were 78 people grouped in cluster 0 and 53 people grouped in cluster 1, where those with potential for punctuality for their studies were in cluster 0 so they were students who could finish within the specified time. Meanwhile, students grouped in cluster 1 illustrate that these students need coaching and guidance both in the study program and with their supervisors. In the classification taken from the results of data clustering, two classes were obtained, namely class a and class b, with 73 and 58 data respectively, so that the results between clustering and classification did not differ too much in the data to predict the accuracy of a student's graduation.


Abbas, K. A., Gharavi, A., Hindi, N. A., Hassan, M., Alhosin, H. Y., Gholinezhad, J., Ghoochaninejad, H., Barati, H., Buick, J., Yousefi, P., Alasmar, R., & Al-Saegh, S. (2023). Unsupervised machine learning technique for classifying production zones in unconventional reservoirs. International Journal of Intelligent Networks, 4, 29–37.

Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. 2017 Fourth International Conference on Image Information Processing (ICIIP), 1–4.

Al-Anazi, S., AlMahmoud, H., & Al-Turaiki, I. (2016). Finding Similar Documents Using Different Clustering Techniques. Procedia Computer Science, 82, 28–34.

Aldo, D. (2023). Data Mining Sales of Skin Care Products Using the K-Means Method. Sinkron, 8(1), 295–304.

Ali, I., Rehman, A. U., Khan, D. M., Khan, Z., Shafiq, M., & Choi, J.-G. (2022). Model Selection Using K-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets. Symmetry, 14(6), 1149.

Ashari, I. F., Banjarnahor, R., Farida, D. R., Aisyah, S. P., Dewi, A. P., & Humaya, N. (2022). Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies. Journal of Applied Informatics and Computing, 6(1), 07–15.

Asril, T. (2020). Prediction of Students Study Period using K-Nearest Neighbor Algorithm. International Journal of Emerging Trends in Engineering Research, 8(6), 2585–2593.

Asroni, A., Kurniasari, D., & Kurnianti, A. (2020). The Implementation of Clustering Method With K-Means Algorithm In Grouping Data of Students’ Course Scores at Universitas Muhammadiyah Yogyakarta. Emerging Information Science and Technology, 1(3), 75–83.

Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 23(2), 957–984.

Chaudhry, M., Shafi, I., Mahnoor, M., Vargas, D. L. R., Thompson, E. B., & Ashraf, I. (2023). A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective. Symmetry, 15(9), 1679.

Den Teuling, N. G. P., Pauws, S. C., & Van Den Heuvel, E. R. (2023). A comparison of methods for clustering longitudinal data with slowly changing trends. Communications in Statistics - Simulation and Computation, 52(3), 621–648.

Dol, S. M., & Jawandhiya, P. M. (2023). Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining—A survey. Engineering Applications of Artificial Intelligence, 122, 106071.

Galluccio, L., Michel, O., Comon, P., & Hero, A. O. (2012). Graph based k-means clustering. Signal Processing, 92(9), 1970–1984.

Ha, W., Ma, L., Cao, Y., Feng, Q., & Bu, S. (2024). The effects of class attendance on academic performance: Evidence from synchronous courses during Covid-19 at a Chinese research university. International Journal of Educational Development, 104, 102952.

Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178–210.

Iyohu, L. R., Ismail Djakaria, & La Ode Nashar. (2023). Perbandingan Metode K-Means Clustering dengan Self-Organizing Maps (SOM) untuk Pengelompokan Provinsi di Indonesia Berdasarkan Data Potensi Desa. Jurnal Statistika Dan Aplikasinya, 7(2), 195–206.

Jasinska-Piadlo, A., Bond, R., Biglarbeigi, P., Brisk, R., Campbell, P., Browne, F., & McEneaneny, D. (2023). Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset. International Journal of Data Science and Analytics, 15(1), 49–66.

Liu, Z., & Barahona, M. (2020). Graph-based data clustering via multiscale community detection. Applied Network Science, 5(1), 3.

Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410.

Loeng, S. (2020). Self-Directed Learning: A Core Concept in Adult Education. Education Research International, 2020, 1–12.

Márquez, J., Lazcano, L., Bada, C., & Arroyo-Barrigüete, J. L. (2023). Class participation and feedback as enablers of student academic performance. SAGE Open, 13(2), 215824402311772.

Maziah Wan Ab Razak, W., Alia Syed Baharom, S., Abdullah, Z., Hamdan, H., Ulfa Abd Aziz, N., & Ismail Mohd Anuar, A. (2019). Academic Performance of University Students: A Case in a Higher Learning Institution. KnE Social Sciences, 3(13), 1294.

Meng, Y., Liang, J., Cao, F., & He, Y. (2018). A new distance with derivative information for functional k-means clustering algorithm. Information Sciences, 463–464, 166–185.

Mulyaningsih, S., & Heikal, J. (2022). K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19. Asia Pacific Management and Business Application, 011(02), 131–142.

Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic Analysis: Striving to Meet the Trustworthiness Criteria. International Journal of Qualitative Methods, 16(1), 160940691773384.

Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: Application and trends. Artificial Intelligence Review, 56(7), 6439–6475.

Pérez-Ortega, J., Nely Almanza-Ortega, N., Vega-Villalobos, A., Pazos-Rangel, R., Zavala-Díaz, C., & Martínez-Rebollar, A. (2020). The K -Means Algorithm Evolution. Introduction to Data Science and Machine Learning, 69-90.

Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Ben Taieb, S., Bergmeir, C., Bessa, R. J., Bijak, J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P., Clements, M. P., Cordeiro, C., Cyrino Oliveira, F. L., De Baets, S., Dokumentov, A., … Ziel, F. (2022). Forecasting: Theory and practice. International Journal of Forecasting, 38(3), 705–871.

Priyatna, R. D., Tulus, & Ramli, M. (2018). K-Means algorithm and modification using gain ratio. IOP Conference Series: Materials Science and Engineering, 420, 012133.

Putra, I. G. K. K., & Dharma, I. G. W. S. (2023). Application of The K-Means Clustering Method To Search For Potential Tourists of Bendesa Hotel. TIERS Information Technology Journal, 4(1), 8–15.

Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and Predicting Students’ Performance by Means of Machine Learning: A Review. Applied Sciences, 10(3), 1042.

Salim, A. P., Laksitowening, K. A., & Asror, I. (2020). Time Series Prediction on College Graduation Using KNN Algorithm. 2020 8th International Conference on Information and Communication Technology (ICoICT), 1–4.

Silva, M. D. B., De Oliveira, R. D. V. C., Da Silveira Barroso Alves, D., & Melo, E. C. P. (2021). Predicting risk of early discontinuation of exclusive breastfeeding at a Brazilian referral hospital for high-risk neonates and infants: A decision-tree analysis. International Breastfeeding Journal, 16(1), 2.

Suwitno, S., & Wibowo, A. (2019). On-Time Graduation Prediction System Using Data Mining Classification Method. Proceedings of the Proceedings of the 1st Workshop on Multidisciplinary and Its Applications Part 1, WMA-01 2018, 19-20 January 2018, Aceh, Indonesia, 1-9

Syafiyah, U., Puspitasari, D. P., Asrafi, I., Wicaksono, B., & Sirait, F. M. (2022). Analisis Perbandingan Hierarchical dan Non-Hierarchical Clustering Pada Data Indikator Ketenagakerjaan di Jawa Barat Tahun 2020. Seminar Nasional Official Statistics, 2022(1), 803–812.

Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Applied Sciences, 13(7), 4550.

Van De Schoot, R., De Bruin, J., Schram, R., Zahedi, P., De Boer, J., Weijdema, F., Kramer, B., Huijts, M., Hoogerwerf, M., Ferdinands, G., Harkema, A., Willemsen, J., Ma, Y., Fang, Q., Hindriks, S., Tummers, L., & Oberski, D. L. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3(2), 125–133.

Wirawan, C., Khudzaeva, E., Hasibuan, T. H., Karjono, & Lubis, Y. H. K. (2019). Application of Data mining to Prediction of Timeliness Graduation of Students (A Case Study). 2019 7th International Conference on Cyber and IT Service Management (CITSM), 1–4.

Yuan, C., & Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. J, 2(2), 226–235.

Zhang, G., & Gionis, A. (2023). Regularized impurity reduction: Accurate decision trees with complexity guarantees. Data Mining and Knowledge Discovery, 37(1), 434–475.

Author Biographies

Sri Rahmawati, Universitas Putra Indonesia YPTK Padang

Sarjon Defit, Universitas Putra Indonesia YPTK Padang


Copyright (c) 2024 Sri Rahmawati, Sarjon Defit

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
  2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
  3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).