The Node Selection Method for Split Attribute in C4.5 Algorithm Using the Coefficient of Determination Values for Multivariate Data Set

Authors

Muhsi , Suprapto , Rofiuddin

DOI:

10.29303/jppipa.v9i7.4031

Published:

2023-07-25

Issue:

Vol. 9 No. 7 (2023): July

Keywords:

Algorithm, Attribute, Multivariate

Research Articles

Downloads

How to Cite

Muhsi, M., Suprapto, S., & Rofiuddin, R. (2023). The Node Selection Method for Split Attribute in C4.5 Algorithm Using the Coefficient of Determination Values for Multivariate Data Set. Jurnal Penelitian Pendidikan IPA, 9(7), 5574–5583. https://doi.org/10.29303/jppipa.v9i7.4031

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract

The split attribute in the decision tree algorithm, especially C4.5, has an important influence in producing a decision tree performance that has high predictive performance. This study aims to perform an attribute split in the C4.5 algorithm using the value of the termination coefficient (R2/R Square) which is combined with the aim of increasing the performance of the model performance produced by the C4.5 algorithm itself. The data used in this research are public datasets and private datasets. This study combines the C4.5 algorithm developed by Quinlan. The results in this study indicate that the use of the R2 value in the C4.5 algorithm has good performance in terms of accuracy and recall because three of the four datasets used have a higher value than the C4.5 algorithm without R2. Whereas in the aspect of precision, it has quite good performance because only two datasets have a higher value than the performance results of the algorithm without R2.

References

Albulayhi, K., Abu Al-Haija, Q., Alsuhibany, S. A., Jillepalli, A. A., Ashrafuzzaman, M., & Sheldon, F. T. (2022). IoT Intrusion Detection Using Machine Learning with a Novel High Performing Feature Selection Method. Applied Sciences, 12(10), 5015. https://doi.org/10.3390/app12105015

Andrade, C. (2021). A Student’s Guide to the Classification and Operationalization of Variables in the Conceptualization and Design of a Clinical Study: Part 1. Indian Journal of Psychological Medicine, 43(2), 177–179. https://doi.org/10.1177/0253717621994334

Demisse, G. B., Tadesse, T., & Bayissa, Y. (2017). Data Mining Attribute Selection Approach for Drought Modelling: A Case Study for Greater Horn of Africa. International Journal of Data Mining & Knowledge Management Process, 7(4), 1–16. https://doi.org/10.5121/ijdkp.2017.7401

Delgado-Bonal, A., & Marshak, A. (2019). Approximate Entropy and Sample Entropy: A Comprehensive Tutorial. Entropy, 21(6), 541. https://doi.org/10.3390/e21060541

Hart, J. D. (2017). Use of BayesSim and Smoothing to Enhance Simulation Studies. Open Journal of Statistics, 7(1), 153–172. https://doi.org/10.4236/ojs.2017.71012

Idriss, S., & Lawan, A. (2019). An Improved C4.5 Model Classification Algorithm Based on Taylor’s Series. Jordanian Journal of Computers and Information Technology, 5(1). https://doi.org/10.5455/jjcit.71-1546551963

Ishak, A., Asfriyati, & Akmaliah, V. (2019). Analytical Hierarchy Process and PROMETHEE as Decision Making Tool: A Review. IOP Conference Series: Materials Science and Engineering, 505(1), 012085. https://doi.org/10.1088/1757-899X/505/1/012085

Jenkins, D. G., & Quintana-Ascencio, P. F. (2020). A solution to minimum sample size for regressions. PLOS ONE, 15(2), e0229345. https://doi.org/10.1371/journal.pone.0229345

Kerckhoffs, J., Hoek, G., Portengen, L., Brunekreef, B., & Vermeulen, R. C. H. (2019). Performance of Prediction Algorithms for Modeling Outdoor Air Pollution Spatial Surfaces. Environmental Science & Technology, 53(3), 1413–1421. https://doi.org/10.1021/acs.est.8b06038

Lamrini, B. (2021). Contribution to Decision Tree Induction with Python: A Review. Data Mining—Methods, Applications and Systems. IntechOpen. https://doi.org/10.5772/intechopen.92438

Lee, S., Lee, C., Mun, K. G., & Kim, D. (2022). Decision Tree Algorithm Considering Distances Between Classes. IEEE Access, 10, 69750–69756. https://doi.org/10.1109/ACCESS.2022.3187172

Loftus, T. J., Tighe, P. J., Ozrazgat-Baslanti, T., Davis, J. P., Ruppert, M. M., Ren, Y., Shickel, B., Kamaleswaran, R., Hogan, W. R., Moorman, J. R., Upchurch, G. R., Rashidi, P., & Bihorac, A. (2022). Ideal algorithms in healthcare: Explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS Digital Health, 1(1), e0000006. https://doi.org/10.1371/journal.pdig.0000006

Madadipouya, K. (2017). A Survey on Data Mining Algorithms and Techniques in Medicine. JOIV: International Journal on Informatics Visualization, 1(3), 61. https://doi.org/10.30630/joiv.1.3.25

Mantas, C. J., Abellán, J., & Castellano, J. G. (2016). Analysis of Credal-C4.5 for classification in noisy domains. Expert Systems with Applications, 61, 314–326. https://doi.org/10.1016/j.eswa.2016.05.035

Mao, L., & Zhang, W. (2021). Analysis of entrepreneurship education in colleges and based on improved decision tree algorithm and fuzzy mathematics. Journal of Intelligent & Fuzzy Systems, 40(2), 2095–2107. https://doi.org/10.3233/JIFS-189210

Mienye, I. D., Sun, Y., & Wang, Z. (2019). Prediction performance of improved decision tree-based algorithms: A review. Procedia Manufacturing, 35, 698–703. https://doi.org/10.1016/j.promfg.2019.06.011

Muhsi, (2021). Model dan Analisa Faktor Eksternal Aktifitas Siswa Kelas X TKJ SMKN 1 Pakong Pamekasan Menggunakan Algoritma Decision Tree. Jurnal Aplikasi Teknologi Informasi Dan Manajemen (JATIM), 2(3), 94–106. https://doi.org/10.31102/jatim.v2i2.1239

Muttaqien, R., Pradana, M. G., & Pramuntadi, A. (2021). Implementation of Data Mining Using C4.5 Algorithm for Predicting Customer Loyalty of PT. Pegadaian (Persero) Pati Area Office. International Journal of Computer and Information System (IJCIS), 2(3), 64–68. https://doi.org/10.29040/ijcis.v2i3.36

Nawawi, M. (2020). Influence On Service Quality, Product Quality, Product Design, Price and Trust To Xl Axiata Customer Loyalty On Students Of Pgri Karang Sari Belitang Iii Oku Timur Vocational High School. International Journal of Economics, Business and Accounting Research (IJEBAR), 4(3). https://doi.org/10.29040/ijebar.v4i03.1251

Perwitasari, A. W. (2022). The Effect of Perceived Usefulness and Perceived Easiness towards Behavioral Intention to Use Fintech by Indonesian MSMEs. The Winners, 23(1), 1–9. https://doi.org/10.21512/tw.v23i1.7078

Putra, P. H., Azanuddin, A., Purba, B., & Dalimunthe, Y. A. (2023). Random forest and decision tree algorithms for car price prediction. Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA), 3(2), 81–89. https://doi.org/10.54076/jumpa.v3i2.305

Riansyah, M., Suwilo, S., & Zarlis, M. (2023). Improved Accuracy in Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost). SinkrOn, 8(2), 617–622. https://doi.org/10.33395/sinkron.v8i2.12055

Sulistiani, H., & Aldino, A. A. (2020). Decision Tree C4.5 Algorithm for Tuition Aid Grant Program Classification (Case Study: Department of Information System, Universitas Teknokrat Indonesia). Jurnal Ilmiah Edutic: Pendidikan dan Informatika, 7(1), 40-50. https://doi.org/10.21107/edutic.v7i1.8849

Taylor, C. J., Pomberger, A., Felton, K. C., Grainger, R., Barecka, M., Chamberlain, T. W., Bourne, R. A., Johnson, C. N., & Lapkin, A. A. (2023). A Brief Introduction to Chemical Reaction Optimization. Chemical Reviews, 123(6), 3089–3126. https://doi.org/10.1021/acs.chemrev.2c00798

Theofani, G., & Sediyono, E. (2022). Multiple Linear Regression Analysis on Factors that Influence Employees Work Motivation. SinkrOn, 7(3), 791–798. https://doi.org/10.33395/sinkron.v7i3.11453

Wang, H.-B., & Gao, Y.-J. (2021). Research on C4.5 algorithm improvement strategy based on MapReduce. Procedia Computer Science, 183, 160–165. https://doi.org/10.1016/j.procs.2021.02.045

Author Biographies

Muhsi, Universitas Islam Madura

Department of Information System

 

Suprapto, Universitas Negeri Yogyakarta

Department of Electronics and Informatics Education

 

Rofiuddin, Universitas Islam Madura

Department of Informatics Engineering

 

License

Copyright (c) 2023 Muhsi, Suprapto, Rofiuddin

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
  2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
  3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).