Prediction of Graduation Accuracy Using the K-Means Clustering Algorithm and Classification Decision Tree
DOI:
10.29303/jppipa.v10i4.7073Published:
2024-04-25Issue:
Vol. 10 No. 4 (2024): AprilKeywords:
Centroid, Clustering, Decision tree K-Means, RandomResearch Articles
Downloads
How to Cite
Downloads
Metrics
Abstract
Becoming a scholar at the right time for students is a very meaningful award for them if it is supported by seriousness and perseverance in their studies. Here, sample data was taken from 131 randomly taken in testing. Where there are still students who are not detected by the study program in completing their lectures, so research is carried out on clustering and classification with decision trees in determining the level of accuracy of lectures by clustering data, determining the initial centroid value and the centroid point. The results found were that there were 78 people grouped in cluster 0 and 53 people grouped in cluster 1, where those with potential for punctuality for their studies were in cluster 0 so they were students who could finish within the specified time. Meanwhile, students grouped in cluster 1 illustrate that these students need coaching and guidance both in the study program and with their supervisors. In the classification taken from the results of data clustering, two classes were obtained, namely class a and class b, with 73 and 58 data respectively, so that the results between clustering and classification did not differ too much in the data to predict the accuracy of a student's graduation.
References
Abbas, K. A., Gharavi, A., Hindi, N. A., Hassan, M., Alhosin, H. Y., Gholinezhad, J., Ghoochaninejad, H., Barati, H., Buick, J., Yousefi, P., Alasmar, R., & Al-Saegh, S. (2023). Unsupervised machine learning technique for classifying production zones in unconventional reservoirs. International Journal of Intelligent Networks, 4, 29–37. https://doi.org/10.1016/j.ijin.2022.11.007
Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. 2017 Fourth International Conference on Image Information Processing (ICIIP), 1–4. https://doi.org/10.1109/ICIIP.2017.8313763
Al-Anazi, S., AlMahmoud, H., & Al-Turaiki, I. (2016). Finding Similar Documents Using Different Clustering Techniques. Procedia Computer Science, 82, 28–34. https://doi.org/10.1016/j.procs.2016.04.005
Aldo, D. (2023). Data Mining Sales of Skin Care Products Using the K-Means Method. Sinkron, 8(1), 295–304. https://doi.org/10.33395/sinkron.v8i1.12007
Ali, I., Rehman, A. U., Khan, D. M., Khan, Z., Shafiq, M., & Choi, J.-G. (2022). Model Selection Using K-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets. Symmetry, 14(6), 1149. https://doi.org/10.3390/sym14061149
Ashari, I. F., Banjarnahor, R., Farida, D. R., Aisyah, S. P., Dewi, A. P., & Humaya, N. (2022). Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies. Journal of Applied Informatics and Computing, 6(1), 07–15. https://doi.org/10.30871/jaic.v6i1.3485
Asril, T. (2020). Prediction of Students Study Period using K-Nearest Neighbor Algorithm. International Journal of Emerging Trends in Engineering Research, 8(6), 2585–2593. https://doi.org/10.30534/ijeter/2020/60862020
Asroni, A., Kurniasari, D., & Kurnianti, A. (2020). The Implementation of Clustering Method With K-Means Algorithm In Grouping Data of Students’ Course Scores at Universitas Muhammadiyah Yogyakarta. Emerging Information Science and Technology, 1(3), 75–83. https://doi.org/10.18196/eist.v1i3.13172
Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 23(2), 957–984. https://doi.org/10.1007/s10639-017-9645-7
Chaudhry, M., Shafi, I., Mahnoor, M., Vargas, D. L. R., Thompson, E. B., & Ashraf, I. (2023). A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective. Symmetry, 15(9), 1679. https://doi.org/10.3390/sym15091679
Den Teuling, N. G. P., Pauws, S. C., & Van Den Heuvel, E. R. (2023). A comparison of methods for clustering longitudinal data with slowly changing trends. Communications in Statistics - Simulation and Computation, 52(3), 621–648. https://doi.org/10.1080/03610918.2020.1861464
Dol, S. M., & Jawandhiya, P. M. (2023). Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining—A survey. Engineering Applications of Artificial Intelligence, 122, 106071. https://doi.org/10.1016/j.engappai.2023.106071
Galluccio, L., Michel, O., Comon, P., & Hero, A. O. (2012). Graph based k-means clustering. Signal Processing, 92(9), 1970–1984. https://doi.org/10.1016/j.sigpro.2011.12.009
Ha, W., Ma, L., Cao, Y., Feng, Q., & Bu, S. (2024). The effects of class attendance on academic performance: Evidence from synchronous courses during Covid-19 at a Chinese research university. International Journal of Educational Development, 104, 102952. https://doi.org/10.1016/j.ijedudev.2023.102952
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139
Iyohu, L. R., Ismail Djakaria, & La Ode Nashar. (2023). Perbandingan Metode K-Means Clustering dengan Self-Organizing Maps (SOM) untuk Pengelompokan Provinsi di Indonesia Berdasarkan Data Potensi Desa. Jurnal Statistika Dan Aplikasinya, 7(2), 195–206. https://doi.org/10.21009/JSA.07208
Jasinska-Piadlo, A., Bond, R., Biglarbeigi, P., Brisk, R., Campbell, P., Browne, F., & McEneaneny, D. (2023). Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset. International Journal of Data Science and Analytics, 15(1), 49–66. https://doi.org/10.1007/s41060-022-00346-9
Liu, Z., & Barahona, M. (2020). Graph-based data clustering via multiscale community detection. Applied Network Science, 5(1), 3. https://doi.org/10.1007/s41109-019-0248-7
Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410
Loeng, S. (2020). Self-Directed Learning: A Core Concept in Adult Education. Education Research International, 2020, 1–12. https://doi.org/10.1155/2020/3816132
Márquez, J., Lazcano, L., Bada, C., & Arroyo-Barrigüete, J. L. (2023). Class participation and feedback as enablers of student academic performance. SAGE Open, 13(2), 215824402311772. https://doi.org/10.1177/21582440231177298
Maziah Wan Ab Razak, W., Alia Syed Baharom, S., Abdullah, Z., Hamdan, H., Ulfa Abd Aziz, N., & Ismail Mohd Anuar, A. (2019). Academic Performance of University Students: A Case in a Higher Learning Institution. KnE Social Sciences, 3(13), 1294. https://doi.org/10.18502/kss.v3i13.4285
Meng, Y., Liang, J., Cao, F., & He, Y. (2018). A new distance with derivative information for functional k-means clustering algorithm. Information Sciences, 463–464, 166–185. https://doi.org/10.1016/j.ins.2018.06.035
Mulyaningsih, S., & Heikal, J. (2022). K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19. Asia Pacific Management and Business Application, 011(02), 131–142. https://doi.org/10.21776/ub.apmba.2022.011.02.1
Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic Analysis: Striving to Meet the Trustworthiness Criteria. International Journal of Qualitative Methods, 16(1), 160940691773384. https://doi.org/10.1177/1609406917733847
Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: Application and trends. Artificial Intelligence Review, 56(7), 6439–6475. https://doi.org/10.1007/s10462-022-10325-y
Pérez-Ortega, J., Nely Almanza-Ortega, N., Vega-Villalobos, A., Pazos-Rangel, R., Zavala-Díaz, C., & Martínez-Rebollar, A. (2020). The K -Means Algorithm Evolution. Introduction to Data Science and Machine Learning, 69-90. https://doi.org/10.5772/intechopen.85447
Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Ben Taieb, S., Bergmeir, C., Bessa, R. J., Bijak, J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P., Clements, M. P., Cordeiro, C., Cyrino Oliveira, F. L., De Baets, S., Dokumentov, A., … Ziel, F. (2022). Forecasting: Theory and practice. International Journal of Forecasting, 38(3), 705–871. https://doi.org/10.1016/j.ijforecast.2021.11.001
Priyatna, R. D., Tulus, & Ramli, M. (2018). K-Means algorithm and modification using gain ratio. IOP Conference Series: Materials Science and Engineering, 420, 012133. https://doi.org/10.1088/1757-899X/420/1/012133
Putra, I. G. K. K., & Dharma, I. G. W. S. (2023). Application of The K-Means Clustering Method To Search For Potential Tourists of Bendesa Hotel. TIERS Information Technology Journal, 4(1), 8–15. https://doi.org/10.38043/tiers.v4i1.4297
Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and Predicting Students’ Performance by Means of Machine Learning: A Review. Applied Sciences, 10(3), 1042. https://doi.org/10.3390/app10031042
Salim, A. P., Laksitowening, K. A., & Asror, I. (2020). Time Series Prediction on College Graduation Using KNN Algorithm. 2020 8th International Conference on Information and Communication Technology (ICoICT), 1–4. https://doi.org/10.1109/ICoICT49345.2020.9166238
Silva, M. D. B., De Oliveira, R. D. V. C., Da Silveira Barroso Alves, D., & Melo, E. C. P. (2021). Predicting risk of early discontinuation of exclusive breastfeeding at a Brazilian referral hospital for high-risk neonates and infants: A decision-tree analysis. International Breastfeeding Journal, 16(1), 2. https://doi.org/10.1186/s13006-020-00349-x
Suwitno, S., & Wibowo, A. (2019). On-Time Graduation Prediction System Using Data Mining Classification Method. Proceedings of the Proceedings of the 1st Workshop on Multidisciplinary and Its Applications Part 1, WMA-01 2018, 19-20 January 2018, Aceh, Indonesia, 1-9 https://doi.org/10.4108/eai.20-1-2018.2281900
Syafiyah, U., Puspitasari, D. P., Asrafi, I., Wicaksono, B., & Sirait, F. M. (2022). Analisis Perbandingan Hierarchical dan Non-Hierarchical Clustering Pada Data Indikator Ketenagakerjaan di Jawa Barat Tahun 2020. Seminar Nasional Official Statistics, 2022(1), 803–812. https://doi.org/10.34123/semnasoffstat.v2022i1.1221
Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Applied Sciences, 13(7), 4550. https://doi.org/10.3390/app13074550
Van De Schoot, R., De Bruin, J., Schram, R., Zahedi, P., De Boer, J., Weijdema, F., Kramer, B., Huijts, M., Hoogerwerf, M., Ferdinands, G., Harkema, A., Willemsen, J., Ma, Y., Fang, Q., Hindriks, S., Tummers, L., & Oberski, D. L. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3(2), 125–133. https://doi.org/10.1038/s42256-020-00287-7
Wirawan, C., Khudzaeva, E., Hasibuan, T. H., Karjono, & Lubis, Y. H. K. (2019). Application of Data mining to Prediction of Timeliness Graduation of Students (A Case Study). 2019 7th International Conference on Cyber and IT Service Management (CITSM), 1–4. https://doi.org/10.1109/CITSM47753.2019.8965425
Yuan, C., & Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. J, 2(2), 226–235. https://doi.org/10.3390/j2020016
Zhang, G., & Gionis, A. (2023). Regularized impurity reduction: Accurate decision trees with complexity guarantees. Data Mining and Knowledge Discovery, 37(1), 434–475. https://doi.org/10.1007/s10618-022-00884-7
Author Biographies
Sri Rahmawati, Universitas Putra Indonesia YPTK Padang
Sarjon Defit, Universitas Putra Indonesia YPTK Padang
License
Copyright (c) 2024 Sri Rahmawati, Sarjon Defit
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).