Dimensionality Reduction in River Water Quality Classification Using Genetic Algorithm and Correlation-Based Feature Selection
DOI:
10.29303/jppipa.v11i9.11863Published:
2025-09-25Downloads
Abstract
Water quality monitoring is a crucial element in data-driven environmental management. This study aims to identify the most important parameters in river water quality classification through feature selection and machine learning approaches. Eleven physicochemical parameters were used as initial features, and two selection methods were applied: Genetic Algorithm (GA) and Spearman Rank Correlation (RS). Classification was performed using Radial Basis Function Support Vector Machine (RBF-SVM), with performance evaluation based on accuracy, F1 score, and recall. GA testing results identified influential parameters (pH, DHL, DO, BOD, COD, TSS, NO₂⁻-N), achieving an accuracy of 96.67% and an F1 score of 0.82. RS generated seven different features with an accuracy of 90.00% and an F1 score of 0.67. Both methods revealed five consistently significant features (DHL, BOD, COD, TSS, NO₂⁻-N), which are the influential features. The model without feature selection, despite producing high accuracy (93.33%), only achieved an F1 score of 0.48, indicating poor recognition of the minority class. These findings confirm that feature selection improves classification efficiency and capability. In conclusion, GA-based feature selection provides the most effective subset for water quality classification and supports the development of intelligent and cost-effective monitoring systems suitable for sensor-based field applications.
Keywords:
Feature selection Genetic algorithm Spearman rank Support vector machine Water quality classificationReferences
Abuzir, S. Y., & Abuzir, Y. S. (2022). Machine learning for water quality classification. Water Quality Research Journal, 57(3), 152–164. https://doi.org/10.2166/wqrj.2022.004 DOI: https://doi.org/10.2166/wqrj.2022.004
Andriani, S., & Wihartiko, D. (2024). Comparison of Genetic Algorithm Optimization with Support Vector Machine (SVM) for Weather Forecast Introduction. Journal of Applied Science and Advanced Technology Journal Homepage. https://doi.org/10.24853/JASAT.6.3.83-90
Awalullaili, F. O., Ispriyanti, D., & Widiharih, T. (2023). Klasifikasi Penyakit Hipertensi Menggunakan Metode SVM Grid Search dan SVM Genetic Algorithm (GA). Jurnal Gaussian, 11(4), 488–498. https://doi.org/10.14710/j.gauss.11.4.488-498 DOI: https://doi.org/10.14710/j.gauss.11.4.488-498
Babatunde, O. H., Armstrong, L., Leng, J., & Diepeveen, D. (2014). Available here This Journal Article is posted at Research Online. International Journal of Electronics Communication and Computer Engineering, 5(4), 899–905. Retrieved from https://ro.ecu.edu.au/ecuworkspost2013
Chen, B., Mu, X., Chen, P., Wang, B., Choi, J., Park, H., & Yang, H. (2021). Machine learning-based inversion of water quality parameters in typical reach of the urban river by UAV multispectral data. Ecological Indicators, 133. https://doi.org/10.1016/j.ecolind.2021.108434 DOI: https://doi.org/10.1016/j.ecolind.2021.108434
Diamantini, E., Lutz, S. R., Mallucci, S., Majone, B., Merz, R., & Bellin, A. (2018). Driver detection of water quality trends in three large European river basins. Science of the Total Environment, 612, 49–62. https://doi.org/10.1016/j.scitotenv.2017.08.172 DOI: https://doi.org/10.1016/j.scitotenv.2017.08.172
Gai, R., & Guo, Z. (2023). A water quality assessment method based on an improved grey relational analysis and particle swarm optimization multi-classification support vector machine. Frontiers in Plant Science, 14. https://doi.org/10.3389/fpls.2023.1099668 DOI: https://doi.org/10.3389/fpls.2023.1099668
Ileberi, E., Sun, Y., & Wang, Z. (2022). A machine learning based credit card fraud detection using the GA algorithm for feature selection. Journal of Big Data, 9(1). https://doi.org/10.1186/s40537-022-00573-8 DOI: https://doi.org/10.1186/s40537-022-00573-8
Iswanto, I., Tulus, T., & Poltak, P. (2022). Comparison Of Feature Selection To Performance Improvement Of K-Nearest Neighbor Algorithm In Data Classification. Jurnal Teknik Informatika (Jutif), 3(6), 1709–1716. https://doi.org/10.20884/1.jutif.2022.3.6.471 DOI: https://doi.org/10.20884/1.jutif.2022.3.6.471
Khatib Sulaiman, J., Riau Taslim, P., Toresa, D., Jollyta, D., Suryani, D., & Sabna, E. (2021). Optimasi K-Means dengan Algoritma Genetika untuk Target Pemanfaat Air Bersih. Indonesian Journal of Computer Science Attribution-ShareAlike, 4(1), 1. Retrieved from https://repository.uir.ac.id/22410/
Koranga, M., Pant, P., Pant, D., Bhatt, A. K., Pant, R. P., Ram, M., & Kumar, T. (2021). SVM Model to Predict the Water Quality Based on Physicochemical Parameters. International Journal of Mathematical, Engineering and Management Sciences, 6(2), 645–659. https://doi.org/10.33889/IJMEMS.2021.6.2.040 DOI: https://doi.org/10.33889/IJMEMS.2021.6.2.040
Li, K., Huang, G., & Baetz, B. (2021). Development of a Wilks feature importance method with improved variable rankings for supporting hydrological inference and modelling. Hydrology and Earth System Sciences, 25(9), 4947–4966. https://doi.org/10.5194/hess-25-4947-2021 DOI: https://doi.org/10.5194/hess-25-4947-2021
Mohamed, S. A., Metwaly, M. M., Metwalli, M. R., AbdelRahman, M. A. E., & Badreldin, N. (2023). Integrating Active and Passive Remote Sensing Data for Mapping Soil Salinity Using Machine Learning and Feature Selection Approaches in Arid Regions. Remote Sensing, 15(7). https://doi.org/10.3390/rs15071751 DOI: https://doi.org/10.3390/rs15071751
Nair, J. P., & Vijaya, M. S. (2022). River Water Quality Prediction and index classification using Machine Learning. Journal of Physics: Conference Series, 2325(1). https://doi.org/10.1088/1742-6596/2325/1/012011 DOI: https://doi.org/10.1088/1742-6596/2325/1/012011
Omar, N., Aly, H., & Little, T. (2022). Optimized Feature Selection Based on a Least-Redundant and Highest-Relevant Framework for a Solar Irradiance Forecasting Model. IEEE Access, 10, 48643–48659. https://doi.org/10.1109/ACCESS.2022.3171230 DOI: https://doi.org/10.1109/ACCESS.2022.3171230
Onah, J. O., Abdulhamid, S. M., Abdullahi, M., Hassan, I. H., & Al-Ghusham, A. (2021). Genetic Algorithm based feature selection and Naïve Bayes for anomaly detection in fog computing environment. Machine Learning with Applications, 6, 100156. https://doi.org/10.1016/j.mlwa.2021.100156 DOI: https://doi.org/10.1016/j.mlwa.2021.100156
Onyelowe, K. C., Gnananandarao, T., & Ebid, A. M. (2022). Estimation of the erodibility of treated unsaturated lateritic soil using support vector machine-polynomial and -radial basis function and random forest regression techniques. Cleaner Materials, 3. https://doi.org/10.1016/j.clema.2021.100039 DOI: https://doi.org/10.1016/j.clema.2021.100039
Putri, A. S., Suhartanto, E., & Andawayanti, U. (2025). Validation of NRECA Parameters for Rainfall-to-Discharge Modeling in the Rejoso Watershed. Jurnal Penelitian Pendidikan IPA, 11(5), 1081–1088. https://doi.org/10.29303/jppipa.v11i5.11107 DOI: https://doi.org/10.29303/jppipa.v11i5.11107
Razaque, A., Ben Haj Frej, M., Almi’ani, M., Alotaibi, M., & Alotaibi, B. (2021). Improved support vector machine enabled radial basis function and linear variants for remote sensing image classification. Sensors, 21(13). https://doi.org/10.3390/s21134431 DOI: https://doi.org/10.3390/s21134431
Restiani, Y., & Purwadi, J. (2024). Support Vector Machine for Classification: A Mathematical and Scientific Approach in Data Analysis. Jurnal Penelitian Pendidikan IPA, 10(11), 9896–9903. https://doi.org/10.29303/jppipa.v10i11.8122 DOI: https://doi.org/10.29303/jppipa.v10i11.8122
Riwanto, Y., Nuruzzaman, M. T., Uyun, S., & Sugiantoro, B. (2023). Data Search Process Optimization using Brute Force and Genetic Algorithm Hybrid Method. IJID (International Journal on Informatics for Development), 11(2), 222–231. https://doi.org/10.14421/ijid.2022.3743 DOI: https://doi.org/10.14421/ijid.2022.3743
Rizwan, A., Iqbal, N., Ahmad, R., & Kim, D. H. (2021). Wr-svm model based on the margin radius approach for solving the minimum enclosing ball problem in support vector machine classification. Applied Sciences (Switzerland), 11(10). https://doi.org/10.3390/app11104657 DOI: https://doi.org/10.3390/app11104657
Rosyadi, H. I., & Ali, M. (2020). Biomonitoring makrozoobentos sebagai indikator kualitas air sungai. Envirotek: Jurnal Ilmiah Teknik Lingkungan, 12(1), 11-18. https://doi.org/10.33005/envirotek.v12i1.43 DOI: https://doi.org/10.33005/envirotek.v12i1.43
Saidi, R., Bouaguel, W., & Essoussi, N. (2019). Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. In Studies in Computational Intelligence (Vol. 801, pp. 3–24). Springer Verlag. https://doi.org/10.1007/978-3-030-02357-7_1 DOI: https://doi.org/10.1007/978-3-030-02357-7_1
Sakaa, B., Elbeltagi, A., Boudibi, S., Chaffaï, H., Islam, A. R. M. T., Kulimushi, L. C., & Wong, Y. J. (2022). Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environmental Science and Pollution Research, 29(32), 48491–48508. https://doi.org/10.1007/s11356-022-18644-x DOI: https://doi.org/10.1007/s11356-022-18644-x
Santoso, T., Sutanto, A., & Achyani, A. (2021). Keanekaragaman Makrozoobentos Sebagai Bioindikator Kualitas Air Di Danau Asam Suoh Lampung Barat. Bioedukasi (Jurnal Pendidikan Biologi), 12(2), 213-220. http://dx.doi.org/10.24127/bioedukasi.v12i2.4450 DOI: https://doi.org/10.24127/bioedukasi.v12i2.4450
Spearman, C. (1904). The Proof and Measurement of Association between Two Things. American Journal of Psychology, 15, 45-58 https://psycnet.apa.org/doi/10.1037/11491-005 DOI: https://doi.org/10.2307/1412159
Su, J., Wang, X., Zhao, S., Chen, B., Li, C., & Yang, Z. (2015). A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll a in Reservoirs. Water (Switzerland), 7(4), 1610–1627. https://doi.org/10.3390/W7041610 DOI: https://doi.org/10.3390/w7041610
Wu, J., & Wang, Z. (2022). A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water (Switzerland), 14(4). https://doi.org/10.3390/w14040610 DOI: https://doi.org/10.3390/w14040610
Zheng, Z., Jiang, Y., Zhang, Q., Zhong, Y., & Wang, L. (2024). A Feature Selection Method Based on Relief Feature Ranking with Recursive Feature Elimination for the Inversion of Urban River Water Quality Parameters Using Multispectral Imagery from an Unmanned Aerial Vehicle. Water (Switzerland), 16(7). https://doi.org/10.3390/w16071029 DOI: https://doi.org/10.3390/w16071029
Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2), 107-116. https://doi.org/10.1016/j.eehl.2022.06.001 DOI: https://doi.org/10.1016/j.eehl.2022.06.001
License
Copyright (c) 2025 Yudha Riwanto, Fauzia Anis Sekar Ningrum

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






