Prediction of Graduation Accuracy Using the K-Means Clustering Algorithm and Classification Decision Tree

Authors

Sri Rahmawati , Sarjon Defit

DOI:

10.29303/jppipa.v10i4.7073

Published:

2024-04-25

Issue:

Vol. 10 No. 4 (2024): April

Keywords:

Centroid, Clustering, Decision tree K-Means, Random

Research Articles

Downloads

How to Cite

Rahmawati, S., & Defit, S. (2024). Prediction of Graduation Accuracy Using the K-Means Clustering Algorithm and Classification Decision Tree. Jurnal Penelitian Pendidikan IPA, 10(4), 2007–2013. https://doi.org/10.29303/jppipa.v10i4.7073

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract

Becoming a scholar at the right time for students is a very meaningful award for them if it is supported by seriousness and perseverance in their studies. Here, sample data was taken from 131 randomly taken in testing. Where there are still students who are not detected by the study program in completing their lectures, so research is carried out on clustering and classification with decision trees in determining the level of accuracy of lectures by clustering data, determining the initial centroid value and the centroid point. The results found were that there were 78 people grouped in cluster 0 and 53 people grouped in cluster 1, where those with potential for punctuality for their studies were in cluster 0 so they were students who could finish within the specified time. Meanwhile, students grouped in cluster 1 illustrate that these students need coaching and guidance both in the study program and with their supervisors. In the classification taken from the results of data clustering, two classes were obtained, namely class a and class b, with 73 and 58 data respectively, so that the results between clustering and classification did not differ too much in the data to predict the accuracy of a student's graduation.

References

Abbas, K. A., Gharavi, A., Hindi, N. A., Hassan, M., Alhosin, H. Y., Gholinezhad, J., Ghoochaninejad, H., Barati, H., Buick, J., Yousefi, P., Alasmar, R., & Al-Saegh, S. (2023). Unsupervised machine learning technique for classifying production zones in unconventional reservoirs. International Journal of Intelligent Networks, 4, 29–37. https://doi.org/10.1016/j.ijin.2022.11.007

Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. 2017 Fourth International Conference on Image Information Processing (ICIIP), 1–4. https://doi.org/10.1109/ICIIP.2017.8313763

Al-Anazi, S., AlMahmoud, H., & Al-Turaiki, I. (2016). Finding Similar Documents Using Different Clustering Techniques. Procedia Computer Science, 82, 28–34. https://doi.org/10.1016/j.procs.2016.04.005

Aldo, D. (2023). Data Mining Sales of Skin Care Products Using the K-Means Method. Sinkron, 8(1), 295–304. https://doi.org/10.33395/sinkron.v8i1.12007

Ali, I., Rehman, A. U., Khan, D. M., Khan, Z., Shafiq, M., & Choi, J.-G. (2022). Model Selection Using K-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets. Symmetry, 14(6), 1149. https://doi.org/10.3390/sym14061149

Ashari, I. F., Banjarnahor, R., Farida, D. R., Aisyah, S. P., Dewi, A. P., & Humaya, N. (2022). Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies. Journal of Applied Informatics and Computing, 6(1), 07–15. https://doi.org/10.30871/jaic.v6i1.3485

Asril, T. (2020). Prediction of Students Study Period using K-Nearest Neighbor Algorithm. International Journal of Emerging Trends in Engineering Research, 8(6), 2585–2593. https://doi.org/10.30534/ijeter/2020/60862020

Asroni, A., Kurniasari, D., & Kurnianti, A. (2020). The Implementation of Clustering Method With K-Means Algorithm In Grouping Data of Students’ Course Scores at Universitas Muhammadiyah Yogyakarta. Emerging Information Science and Technology, 1(3), 75–83. https://doi.org/10.18196/eist.v1i3.13172

Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 23(2), 957–984. https://doi.org/10.1007/s10639-017-9645-7

Chaudhry, M., Shafi, I., Mahnoor, M., Vargas, D. L. R., Thompson, E. B., & Ashraf, I. (2023). A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective. Symmetry, 15(9), 1679. https://doi.org/10.3390/sym15091679

Den Teuling, N. G. P., Pauws, S. C., & Van Den Heuvel, E. R. (2023). A comparison of methods for clustering longitudinal data with slowly changing trends. Communications in Statistics - Simulation and Computation, 52(3), 621–648. https://doi.org/10.1080/03610918.2020.1861464

Dol, S. M., & Jawandhiya, P. M. (2023). Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining—A survey. Engineering Applications of Artificial Intelligence, 122, 106071. https://doi.org/10.1016/j.engappai.2023.106071

Galluccio, L., Michel, O., Comon, P., & Hero, A. O. (2012). Graph based k-means clustering. Signal Processing, 92(9), 1970–1984. https://doi.org/10.1016/j.sigpro.2011.12.009

Ha, W., Ma, L., Cao, Y., Feng, Q., & Bu, S. (2024). The effects of class attendance on academic performance: Evidence from synchronous courses during Covid-19 at a Chinese research university. International Journal of Educational Development, 104, 102952. https://doi.org/10.1016/j.ijedudev.2023.102952

Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139

Iyohu, L. R., Ismail Djakaria, & La Ode Nashar. (2023). Perbandingan Metode K-Means Clustering dengan Self-Organizing Maps (SOM) untuk Pengelompokan Provinsi di Indonesia Berdasarkan Data Potensi Desa. Jurnal Statistika Dan Aplikasinya, 7(2), 195–206. https://doi.org/10.21009/JSA.07208

Jasinska-Piadlo, A., Bond, R., Biglarbeigi, P., Brisk, R., Campbell, P., Browne, F., & McEneaneny, D. (2023). Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset. International Journal of Data Science and Analytics, 15(1), 49–66. https://doi.org/10.1007/s41060-022-00346-9

Liu, Z., & Barahona, M. (2020). Graph-based data clustering via multiscale community detection. Applied Network Science, 5(1), 3. https://doi.org/10.1007/s41109-019-0248-7

Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410

Loeng, S. (2020). Self-Directed Learning: A Core Concept in Adult Education. Education Research International, 2020, 1–12. https://doi.org/10.1155/2020/3816132

Márquez, J., Lazcano, L., Bada, C., & Arroyo-Barrigüete, J. L. (2023). Class participation and feedback as enablers of student academic performance. SAGE Open, 13(2), 215824402311772. https://doi.org/10.1177/21582440231177298

Maziah Wan Ab Razak, W., Alia Syed Baharom, S., Abdullah, Z., Hamdan, H., Ulfa Abd Aziz, N., & Ismail Mohd Anuar, A. (2019). Academic Performance of University Students: A Case in a Higher Learning Institution. KnE Social Sciences, 3(13), 1294. https://doi.org/10.18502/kss.v3i13.4285

Meng, Y., Liang, J., Cao, F., & He, Y. (2018). A new distance with derivative information for functional k-means clustering algorithm. Information Sciences, 463–464, 166–185. https://doi.org/10.1016/j.ins.2018.06.035

Mulyaningsih, S., & Heikal, J. (2022). K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19. Asia Pacific Management and Business Application, 011(02), 131–142. https://doi.org/10.21776/ub.apmba.2022.011.02.1

Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic Analysis: Striving to Meet the Trustworthiness Criteria. International Journal of Qualitative Methods, 16(1), 160940691773384. https://doi.org/10.1177/1609406917733847

Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: Application and trends. Artificial Intelligence Review, 56(7), 6439–6475. https://doi.org/10.1007/s10462-022-10325-y

Pérez-Ortega, J., Nely Almanza-Ortega, N., Vega-Villalobos, A., Pazos-Rangel, R., Zavala-Díaz, C., & Martínez-Rebollar, A. (2020). The K -Means Algorithm Evolution. Introduction to Data Science and Machine Learning, 69-90. https://doi.org/10.5772/intechopen.85447

Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Ben Taieb, S., Bergmeir, C., Bessa, R. J., Bijak, J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P., Clements, M. P., Cordeiro, C., Cyrino Oliveira, F. L., De Baets, S., Dokumentov, A., … Ziel, F. (2022). Forecasting: Theory and practice. International Journal of Forecasting, 38(3), 705–871. https://doi.org/10.1016/j.ijforecast.2021.11.001

Priyatna, R. D., Tulus, & Ramli, M. (2018). K-Means algorithm and modification using gain ratio. IOP Conference Series: Materials Science and Engineering, 420, 012133. https://doi.org/10.1088/1757-899X/420/1/012133

Putra, I. G. K. K., & Dharma, I. G. W. S. (2023). Application of The K-Means Clustering Method To Search For Potential Tourists of Bendesa Hotel. TIERS Information Technology Journal, 4(1), 8–15. https://doi.org/10.38043/tiers.v4i1.4297

Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and Predicting Students’ Performance by Means of Machine Learning: A Review. Applied Sciences, 10(3), 1042. https://doi.org/10.3390/app10031042

Salim, A. P., Laksitowening, K. A., & Asror, I. (2020). Time Series Prediction on College Graduation Using KNN Algorithm. 2020 8th International Conference on Information and Communication Technology (ICoICT), 1–4. https://doi.org/10.1109/ICoICT49345.2020.9166238

Silva, M. D. B., De Oliveira, R. D. V. C., Da Silveira Barroso Alves, D., & Melo, E. C. P. (2021). Predicting risk of early discontinuation of exclusive breastfeeding at a Brazilian referral hospital for high-risk neonates and infants: A decision-tree analysis. International Breastfeeding Journal, 16(1), 2. https://doi.org/10.1186/s13006-020-00349-x

Suwitno, S., & Wibowo, A. (2019). On-Time Graduation Prediction System Using Data Mining Classification Method. Proceedings of the Proceedings of the 1st Workshop on Multidisciplinary and Its Applications Part 1, WMA-01 2018, 19-20 January 2018, Aceh, Indonesia, 1-9 https://doi.org/10.4108/eai.20-1-2018.2281900

Syafiyah, U., Puspitasari, D. P., Asrafi, I., Wicaksono, B., & Sirait, F. M. (2022). Analisis Perbandingan Hierarchical dan Non-Hierarchical Clustering Pada Data Indikator Ketenagakerjaan di Jawa Barat Tahun 2020. Seminar Nasional Official Statistics, 2022(1), 803–812. https://doi.org/10.34123/semnasoffstat.v2022i1.1221

Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Applied Sciences, 13(7), 4550. https://doi.org/10.3390/app13074550

Van De Schoot, R., De Bruin, J., Schram, R., Zahedi, P., De Boer, J., Weijdema, F., Kramer, B., Huijts, M., Hoogerwerf, M., Ferdinands, G., Harkema, A., Willemsen, J., Ma, Y., Fang, Q., Hindriks, S., Tummers, L., & Oberski, D. L. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3(2), 125–133. https://doi.org/10.1038/s42256-020-00287-7

Wirawan, C., Khudzaeva, E., Hasibuan, T. H., Karjono, & Lubis, Y. H. K. (2019). Application of Data mining to Prediction of Timeliness Graduation of Students (A Case Study). 2019 7th International Conference on Cyber and IT Service Management (CITSM), 1–4. https://doi.org/10.1109/CITSM47753.2019.8965425

Yuan, C., & Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. J, 2(2), 226–235. https://doi.org/10.3390/j2020016

Zhang, G., & Gionis, A. (2023). Regularized impurity reduction: Accurate decision trees with complexity guarantees. Data Mining and Knowledge Discovery, 37(1), 434–475. https://doi.org/10.1007/s10618-022-00884-7

Author Biographies

Sri Rahmawati, Universitas Putra Indonesia YPTK Padang

Sarjon Defit, Universitas Putra Indonesia YPTK Padang

License

Copyright (c) 2024 Sri Rahmawati, Sarjon Defit

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
  2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
  3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).