Analysis of Naïve Bayes and K-Nearest Neighbors Algorithms for Classifying Fishermen Aid Eligibility
DOI:
10.29303/jppipa.v10i10.8818Published:
2024-10-25Issue:
Vol. 10 No. 10 (2024): OctoberKeywords:
Classification, F1-score, Inbalance Datset, K-Nearst Neighbor, Naïve Bayes, SMOTEResearch Articles
Downloads
How to Cite
Downloads
Metrics
Abstract
This article analyzes the use of data mining with Naïve Bayes and K-Nearest Neighbor (KNN) algorithms to build classification models and evaluate their performance in identifying fishermen eligible for aid. The study aims to compare the effectiveness of these algorithms in handling imbalanced datasets using the Synthetic Minority Over-sampling Technique (SMOTE). The research applies SMOTE to improve the balance of the dataset before classification. Without SMOTE, Naïve Bayes achieved an accuracy of 97.01%, precision of 94.16%, recall of 96.67%, and F1-score of 95.39%. KNN, on the other hand, reached an accuracy of 94.04%, precision of 94.53%, recall of 86.00%, and F1-score of 90.06%. After applying SMOTE, both algorithms improved: Naïve Bayes attained an accuracy of 98.33%, precision of 96.86%, recall of 100.00%, and F1-score of 98.49%, while KNN reached an accuracy of 96.90%, precision of 97.72%, recall of 96.19%, and F1-score of 96.94%. The results show that Naïve Bayes, with SMOTE, outperforms KNN in managing data imbalance and accurately classifying eligible fishermen for aid.
References
Putri, H. A. I. Purnamasari, A. R., Dikananda, O. Nurdiawan, S., Anwar. (2021). Penerima Manfaat Bantuan Non Tunai Kartu Keluarga Sejahtera Menggunakan Metode NAÏVE BAYES dan KNN. Building of Informatics, Technology and Science (BITS), 3,(3), 331–337. https://doi.org/10.47065/bits.v3i3.1093.
Fauziah, E., Araafi, A., Mauliyand, S., & Hasibuan, A. (2024). Analisis Potensi Bahaya Lingkungan Kerja Pada Nelayan Tradisional di Wilayah Pesisir. Alahyan Jurnal Pengabdian Masyarakat Multidisiplin(ECOS-PRENEURS), 2(1), 45-51. https://doi.org/10.61492/ecos-preneurs.v2i1.96.
Parenrengi, S., Yunas, S., & Hilmiyah, N. (2020). Sosial Ekonomi Dan Kesejahteraan Nelayan Di Wilayah Teluk Jakarta: Literature Review. Jurnal Riset Manajemen dan Bisnis (JRMB) Fakultas Ekonomi UNIAT, 5(1), 93–104. http://jrmb.ejournal-feuniat.net/index.php/JRMB/article/view/274.
Hutajulu, H. (2023). Analysis of Vulnerability and Resilience of Fisherman Households in Facing the Covid-19 Pandemic in Jayapura-Papua City. Journal of Research in Science Education, 9(9), 7146–7153. https://doi.org/10.29303/jppipa.v9i9.4618
Syukur, A., Mahrus, M., & AR, S. (2018). Relevansi Budidaya Ramah Lingkungan Terhadap Perlindungan Lamun Skala Lokal Di Pesisir Lombok Timur. Journal of Research in Science Education, 5(1). https://doi.org/10.29303/jppipa.v5i1.150
Sano, A. V. D., Stefanus, A. A., Madyatmadja, E. D., Nindito, H., Purnomo, A., & Sianipar, C. P. M. (2023). Proposing a visualized comparative review analysis model on tourism domain using Naïve Bayes classifier. Procedia Computer Science, 227, 482–489. https://doi.org/10.1016/j.procs.2023.10.549.
Libnao, M., Misula, M., Andres, C., Mariñas, J., & Fabregas, A. (2023). Traffic incident prediction and classification system using naïve bayes algorithm. Procedia Computer Science, 227, 316–325. https://doi.org/10.1016/j.procs.2023.10.530.
Martín-Martín, M., Bullejos, M., Cabezas, D., & Alcalá, F. J. (2023). Using python libraries and k-Nearest neighbors algorithms to delineate syn-sedimentary faults in sedimentary porous media. Marine and Petroleum Geology, 153. https://doi.org/10.1016/j.marpetgeo.2023.106283.
Hasdyna, N., & Kesuma D. R. (2020). Analisis Matthew Correlation Coefficient pada K-Nearest Neighbor dalam Klasifikasi Ikan Hias. Informatics Journal, 5(2), 57-64. https://doi.org/10.19184/isj.v5i2.18907
Azhari, M., Situmorang, Z., & Rosnelly, R. (2021). Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes. JURNAL MEDIA INFORMATIKA BUDIDARMA, 5(2), 640–651. https://doi.org/10.30865/mib.v5i2.2937.
Azizah, H., Rintyarna, B. S., Cahyanto, T. A. (2022) "Sentimen Analisis Untuk Mengukur Kepercayaan Masyarakat Terhadap Pengadaan Vaksin Covid-19 Berbasis Bernoulli Naive Bayes. BIOS : Jurnal Teknologi Informasi dan Rekayasa Komputer, 3(1), 23-29.. https://doi.org/10.37148/bios.v3i1.36.
Merdeka, P. H. (2022). Manajemen Peningkatan Kesejahteraan Masyarakat Pesisir Melalui Pemberdayaan Usaha Lokal Masyarakat. A Review, 1(1), 1-9. https://doi.org/10.56855/analysis.v1i1.180.
Wahab, A. (2019). Faktor-Faktor Yang Berhubungan dengan Keluhan Nyeri Punggung Bawah (Low Back Pain) Pada Nelayan di Desa Batu Karas Kecamatan Cijulang Pangandaran. Biomedika. 11(1), 25-35. https://doi.org/10.23917/biomedika.v11i1.7599.
Chen, H., Hu, S., Hua, R., & Zhao, X. (2021). Improved naive Bayes classification algorithm for traffic risk management. Eurasip Journal on Advances in Signal Processing, no. 1. https://doi.org/10.1186/s13634-021-00742-6.
Farhana, S. (2021). Classification of Academic Performance for University Research Evaluation by Implementing Modified Naive Bayes Algorithm. Procedia Computer Science, vol. 194, 224–228. https://doi.org/10.1016/j.procs.2021.10.077.
Imanuddin, S. H., Adi, K., & Gernowo, R. (2023). Sentiment Analysis Naive Bayes Method on SatuSehat Application. Jurnal Penelitian Pendidikan IPA, 9(7), 5524–5531. https://doi.org/10.29303/jppipa.v9i7.4054.
Putro, H. F., Vulandari, R. T. W., & Saptomo, L. Y. (2020). Penerapan Metode Naive Bayes Untuk Klasifikasi Pelanggan. Jurnal Teknologi Informasi dan Komunikasi (TIKomSiN), 8(2), 78-85. https://doi.org/10.30646/tikomsin.v8i2.500.
Ridwan, A. (2022). Penerapan Algoritma Naïve Bayes Untuk Klasifikasi Penyakit Diabetes Mellitus. Jurnal Sistem Komputer dan Kecerdasan Buatan, 7(1), 15-21. https://doi.org/10.47970/siskom-kb.v4i1.169.
Raysyah, S., Arinal, V., & Mulyana, D. I. (2021). Klasifikasi Tingkat Kematangan Buah Kopi Berdasarkan Deteksi Warna Menggunakan Metode KNN dan PCA. Sistem Informasi, 8(2), 88–95. https://doi.org/10.30656/jsii.v8i2.3638.
Sahu, P., Singh, B. K., & Nirala, N. (2024). Optimized k-nearest neighbors for classification of prosthetic hand movements using electromyography signal Engineering Applications of Artificial Intelligence, no. 133. https://doi.org/10.1016/j.engappai.2024.108390.
Argina, A. M. (2020). Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes. Indonesian Journal of Data and Science, 1(2), 29-33. https://doi.org/10.33096/ijodas.v1i2.11.
Rahmahwati, R., & Kirana, E. T. (2023). Implementation of C4.5 and K-Nearest Neighbor to Predict Palm Oil Fruit Production on Local Plantations. Journal of Research in Science Education, 9(9), 7454–7461. https://doi.org/10.29303/jppipa.v9i9.4498.
Kurniawan, Y. I., & Barokah, T. I. (2020). Klasifikasi Penentuan Pengajuan Kartu Kredit Menggunakan K-Nearest Neighbor. Jurnal Ilmiah MATRIK, 22(1), 73-82. https://doi.org/10.33557/jurnalmatrik.v22i1.843.
Kurniadi, F. I., & Larasati, P. D. (2022). Light Gradient Boosting Machine untuk Deteksi Penyakit Stroke. Jurnal Sistem Komputer dan Kecerdasan Buatan, 6 (1)67-72. https://doi.org/10.47970/siskom-kb.v6i1.328.
Yacoub, M. H., Ismail, S. M., Said, L. A., Madian, A. H., & Ridwan, A. G. (2024). Reconfigurable hardware implementation of Knearest neighbor algorithm on FPGA. AEU - International Journal of Electronics and Communications, no. 173. https://doi.org/10.1016/j.aeue.2023.154999.
Kenia, S., Loka, P., & Marsal, A. (2023). Perbandingan Algoritma K-Nearest Neighbor dan Naïve Bayes Classifier Untuk Klasifikasi Status Gizi Pada Balita. Indonesian Journal of Machine Learning and Computer Science, 1(3), 8-14. https://doi.org/10.57152/malcom.v3i1.474.
Prianata, W. (2024). Dampak Pengambilan Sampel Data untuk Optimalisasi Data Tidak Seimbang pada Klasifikasi Penipuan Transaksi E-Commerce Wowon Priatna. Indonesian Journal of Computer Science Attribution, 13(2), 3070. https://doi.org/10.33022/ijcs.v13i2.3698.
Hunafa, M. R., & Hermawan. A. (2023). Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor PadaImbalace Class Dataset Penyakit Diabetes. Media Online, 4(3), 1551-1561. https://doi.org/10.30865/klik.v4i3.1486.
Ariyanti, D. I. K. (2020). Teks Mining Untuk Klasifikasi Keluhan Masyarakat Menggunakan Algoritma Naive Bayes. Jurnal IKRA-ITH Informatika, 4(3), 125-132. http://repository.upm.ac.id/id/eprint/4613.
Shang, Y. (2024). Prevention and detection of DDOS attack in virtual cloud computing environment using Naive Bayes algorithm of machine learning. Measurement: Sensors, 31. https://doi.org/10.1016/j.measen.2023.100991.
Nasution, B., Ritonga, W., Siagian, R. C., Pandara, P. D., Alfaris, L., Muhammad, A. C., & Nurahman, A. (2023). Relationship Between BE4DBE2 and Variables n and z: A Comprehensive Analysis Using Linear Regression, Nonparametric Regression, Naive Bayes Classification, Decision Tree Analysis, SVM Analysis, K-Means Clustering, and Bayesian Regression. Journal of Research in Science Education,, 9(11), 9532–9546. https://doi.org/10.29303/jppipa.v9i11.4483.
Normawati, D., & Prayogi, S. A. (2021). Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter. Jurnal Sains Komputer & Informatika (J-SAKTI), 5(2), 697-711. http://dx.doi.org/10.30645/j-sakti.v5i2.369.
Hutapea, M. I., & Silalahi, A. P. (2023). Moderna’s Vaccine Using the K-Nearest Neighbor (KNN) Method: An Analysis of Community Sentiment on Twitter. Journal of Research in Science Education, 9(5), 3808–3814. https://doi.org/10.29303/jppipa.v9i5.3203.
Lubis, R. D., Iqbal1, M., & Wahyuni, S. (2024). The Influence of Gadget Use and Teacher Creativity Through Motivation on Students Learning Outcomes of Science. Journal of Research in Science Education, 10(9), 6287-6297. https://doi.org/10.29303/jppipa.v10i9.8594.
Tan, Y., Sherwood, B., & Shenoy, P. P. (2024). A naïve Bayes regularized logistic regression estimator for low-dimensional classification. International Journal of Approximate Reasoning, 172. https://doi.org/10.1016/j.ijar.2024.109239.
Surampudi, S., & Kumar, V. (2024). Hybrid Naïve Bayes Gaussian mixture models and SAR polarimetry based automatic flooded vegetation studies using PALSAR-2 data. Remote Sensing Applications: Society and Environment, 36. https://doi.org/10.1016/j.rsase.2024.101361.
Iskandar, J. W., & Nataliani, Y. (2021). Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(6), 1120–1126. https://doi.org/10.29207/resti.v5i6.3588.
Bunkhumpornpat, C., Boonchieng, E., Chouvatut, V., & Lipsky, D. (2024). FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority class distributions. Patterns, 101073. https://doi.org/10.1016/j.patter.2024.101073
Sholihah, N. N., & Hermawan, A. (2023). Implementation Of Random Forest And Smote Methods For Economic Status Classification In Cirebon City. Jurnal Teknik Informatika (Jutif), 4(6), 1387–1397. https://doi.org/10.52436/1.jutif.2023.4.6.1135.
Biyantoro, A. & Prasetyo, B. (2024). Application of Decision Tree for Health Status Classification, Compared to KNN and Naive Bayes. Indonesian Journal of Informatic Research and Software Engineering, 4(1). 47–55. https://journal.irpi.or.id/index.php/ijirse.
Nugroho, A., & Religia, Y. (2021). Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(3)504–510. https://doi.org/10.29207/resti.v5i3.3067.
Liu, Y., Fan, J., Qi, X., Shen, B., Zhang, R., & Yao, K. (2024). Adaptive ferroelectric states in KNN-based piezoceramics: Unveiling the mechanism of enhancing piezoelectric properties through multiple phase boundary engineering. Nano Energy, 128. https://doi.org/10.1016/j.nanoen.2024.109972.
Danitasari, F., Ryan, M., Handoko, D., & Pramuwardani, I. (2024). Improving Accuracy of Daily Weather Forecast Model at Soekarno-Hatta Airport Using BILSTM with SMOTE and ADASYN. Journal of Research in Science Education, 10(1), 179–193. https://doi.org/10.29303/jppipa.v10i1.5906.
Chachoui, Y., Azizi, N., Hotte, R., & Bensebaa, T. (2024). Enhancing algorithmic assessment in education: Equi-fused-data-based SMOTE for balanced learning. Computers and Education: Artificial Intelligence, 6. https://doi.org/10.1016/j.caeai.2024.100222.
Prasetiyo, W. T., Farikhin, & Sugiharto, A. (2024). Comparative Analysis of User Satisfaction of End User Computing Satisfaction, DeLone & McLean and Webqual 4.0 Methods. Journal of Research in Science Education, 10(9), 6826–6834. https://doi.org/10.29303/jppipa.v10i9.8484.
Wang, H., Hu, Z., Guo, W., Zhu, H., Xing, Z., Wang, H., & Cai, Z. (2024). Effect of A and B-site ion doping on the structure and properties of KNN-based ceramic coatings. Ceramics International, 50(20), 37809–37819. https://doi.org/10.1016/j.ceramint.2024.07.145.
Peretz, O., Koren, M., & Koren, O. (2024). Naive Bayes classifier – An ensemble procedure for recall and precision enrichment. Engineering Applications of Artificial Intelligence, 136. https://doi.org/10.1016/j.engappai.2024.108972.
Syefudin, S., Hendry, H., & Iriani, A. (2023). Analysis of Student Satisfaction with the Quality of Education Services and Lecturer Performance Using the Survey and Naive Bayes Methods. Journal of Research in Science Education, 9(11), 9423–9430. https://doi.org/10.29303/jppipa.v9i11.5367.
Author Biographies
Muhammad Nasrullah, Universitas Diponegoro
Bayu Surarso, Universitas Diponegoro
Oky Dwi Nurhayati, Universitas Diponegoro
License
Copyright (c) 2024 Muhammad Nasrullah, Bayu Surarso, Oky Dwi Nurhayati
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).