Vol. 11 No. 9 (2025): September
Open Access
Peer Reviewed

Dimensionality Reduction in River Water Quality Classification Using Genetic Algorithm and Correlation-Based Feature Selection

Authors

Yudha Riwanto , Fauzia Anis Sekar Ningrum

DOI:

10.29303/jppipa.v11i9.11863

Published:

2025-09-25

Downloads

Abstract

Water quality monitoring is a crucial element in data-driven environmental management. This study aims to identify the most important parameters in river water quality classification through feature selection and machine learning approaches. Eleven physicochemical parameters were used as initial features, and two selection methods were applied: Genetic Algorithm (GA) and Spearman Rank Correlation (RS). Classification was performed using Radial Basis Function Support Vector Machine (RBF-SVM), with performance evaluation based on accuracy, F1 score, and recall. GA testing results identified influential parameters (pH, DHL, DO, BOD, COD, TSS, NO₂⁻-N), achieving an accuracy of 96.67% and an F1 score of 0.82. RS generated seven different features with an accuracy of 90.00% and an F1 score of 0.67. Both methods revealed five consistently significant features (DHL, BOD, COD, TSS, NO₂⁻-N), which are the influential features. The model without feature selection, despite producing high accuracy (93.33%), only achieved an F1 score of 0.48, indicating poor recognition of the minority class. These findings confirm that feature selection improves classification efficiency and capability. In conclusion, GA-based feature selection provides the most effective subset for water quality classification and supports the development of intelligent and cost-effective monitoring systems suitable for sensor-based field applications.

Keywords:

Feature selection Genetic algorithm Spearman rank Support vector machine Water quality classification

References

Abuzir, S. Y., & Abuzir, Y. S. (2022). Machine learning for water quality classification. Water Quality Research Journal, 57(3), 152–164. https://doi.org/10.2166/wqrj.2022.004 DOI: https://doi.org/10.2166/wqrj.2022.004

Andriani, S., & Wihartiko, D. (2024). Comparison of Genetic Algorithm Optimization with Support Vector Machine (SVM) for Weather Forecast Introduction. Journal of Applied Science and Advanced Technology Journal Homepage. https://doi.org/10.24853/JASAT.6.3.83-90

Awalullaili, F. O., Ispriyanti, D., & Widiharih, T. (2023). Klasifikasi Penyakit Hipertensi Menggunakan Metode SVM Grid Search dan SVM Genetic Algorithm (GA). Jurnal Gaussian, 11(4), 488–498. https://doi.org/10.14710/j.gauss.11.4.488-498 DOI: https://doi.org/10.14710/j.gauss.11.4.488-498

Babatunde, O. H., Armstrong, L., Leng, J., & Diepeveen, D. (2014). Available here This Journal Article is posted at Research Online. International Journal of Electronics Communication and Computer Engineering, 5(4), 899–905. Retrieved from https://ro.ecu.edu.au/ecuworkspost2013

Chen, B., Mu, X., Chen, P., Wang, B., Choi, J., Park, H., & Yang, H. (2021). Machine learning-based inversion of water quality parameters in typical reach of the urban river by UAV multispectral data. Ecological Indicators, 133. https://doi.org/10.1016/j.ecolind.2021.108434 DOI: https://doi.org/10.1016/j.ecolind.2021.108434

Diamantini, E., Lutz, S. R., Mallucci, S., Majone, B., Merz, R., & Bellin, A. (2018). Driver detection of water quality trends in three large European river basins. Science of the Total Environment, 612, 49–62. https://doi.org/10.1016/j.scitotenv.2017.08.172 DOI: https://doi.org/10.1016/j.scitotenv.2017.08.172

Gai, R., & Guo, Z. (2023). A water quality assessment method based on an improved grey relational analysis and particle swarm optimization multi-classification support vector machine. Frontiers in Plant Science, 14. https://doi.org/10.3389/fpls.2023.1099668 DOI: https://doi.org/10.3389/fpls.2023.1099668

Ileberi, E., Sun, Y., & Wang, Z. (2022). A machine learning based credit card fraud detection using the GA algorithm for feature selection. Journal of Big Data, 9(1). https://doi.org/10.1186/s40537-022-00573-8 DOI: https://doi.org/10.1186/s40537-022-00573-8

Iswanto, I., Tulus, T., & Poltak, P. (2022). Comparison Of Feature Selection To Performance Improvement Of K-Nearest Neighbor Algorithm In Data Classification. Jurnal Teknik Informatika (Jutif), 3(6), 1709–1716. https://doi.org/10.20884/1.jutif.2022.3.6.471 DOI: https://doi.org/10.20884/1.jutif.2022.3.6.471

Khatib Sulaiman, J., Riau Taslim, P., Toresa, D., Jollyta, D., Suryani, D., & Sabna, E. (2021). Optimasi K-Means dengan Algoritma Genetika untuk Target Pemanfaat Air Bersih. Indonesian Journal of Computer Science Attribution-ShareAlike, 4(1), 1. Retrieved from https://repository.uir.ac.id/22410/

Koranga, M., Pant, P., Pant, D., Bhatt, A. K., Pant, R. P., Ram, M., & Kumar, T. (2021). SVM Model to Predict the Water Quality Based on Physicochemical Parameters. International Journal of Mathematical, Engineering and Management Sciences, 6(2), 645–659. https://doi.org/10.33889/IJMEMS.2021.6.2.040 DOI: https://doi.org/10.33889/IJMEMS.2021.6.2.040

Li, K., Huang, G., & Baetz, B. (2021). Development of a Wilks feature importance method with improved variable rankings for supporting hydrological inference and modelling. Hydrology and Earth System Sciences, 25(9), 4947–4966. https://doi.org/10.5194/hess-25-4947-2021 DOI: https://doi.org/10.5194/hess-25-4947-2021

Mohamed, S. A., Metwaly, M. M., Metwalli, M. R., AbdelRahman, M. A. E., & Badreldin, N. (2023). Integrating Active and Passive Remote Sensing Data for Mapping Soil Salinity Using Machine Learning and Feature Selection Approaches in Arid Regions. Remote Sensing, 15(7). https://doi.org/10.3390/rs15071751 DOI: https://doi.org/10.3390/rs15071751

Nair, J. P., & Vijaya, M. S. (2022). River Water Quality Prediction and index classification using Machine Learning. Journal of Physics: Conference Series, 2325(1). https://doi.org/10.1088/1742-6596/2325/1/012011 DOI: https://doi.org/10.1088/1742-6596/2325/1/012011

Omar, N., Aly, H., & Little, T. (2022). Optimized Feature Selection Based on a Least-Redundant and Highest-Relevant Framework for a Solar Irradiance Forecasting Model. IEEE Access, 10, 48643–48659. https://doi.org/10.1109/ACCESS.2022.3171230 DOI: https://doi.org/10.1109/ACCESS.2022.3171230

Onah, J. O., Abdulhamid, S. M., Abdullahi, M., Hassan, I. H., & Al-Ghusham, A. (2021). Genetic Algorithm based feature selection and Naïve Bayes for anomaly detection in fog computing environment. Machine Learning with Applications, 6, 100156. https://doi.org/10.1016/j.mlwa.2021.100156 DOI: https://doi.org/10.1016/j.mlwa.2021.100156

Onyelowe, K. C., Gnananandarao, T., & Ebid, A. M. (2022). Estimation of the erodibility of treated unsaturated lateritic soil using support vector machine-polynomial and -radial basis function and random forest regression techniques. Cleaner Materials, 3. https://doi.org/10.1016/j.clema.2021.100039 DOI: https://doi.org/10.1016/j.clema.2021.100039

Putri, A. S., Suhartanto, E., & Andawayanti, U. (2025). Validation of NRECA Parameters for Rainfall-to-Discharge Modeling in the Rejoso Watershed. Jurnal Penelitian Pendidikan IPA, 11(5), 1081–1088. https://doi.org/10.29303/jppipa.v11i5.11107 DOI: https://doi.org/10.29303/jppipa.v11i5.11107

Razaque, A., Ben Haj Frej, M., Almi’ani, M., Alotaibi, M., & Alotaibi, B. (2021). Improved support vector machine enabled radial basis function and linear variants for remote sensing image classification. Sensors, 21(13). https://doi.org/10.3390/s21134431 DOI: https://doi.org/10.3390/s21134431

Restiani, Y., & Purwadi, J. (2024). Support Vector Machine for Classification: A Mathematical and Scientific Approach in Data Analysis. Jurnal Penelitian Pendidikan IPA, 10(11), 9896–9903. https://doi.org/10.29303/jppipa.v10i11.8122 DOI: https://doi.org/10.29303/jppipa.v10i11.8122

Riwanto, Y., Nuruzzaman, M. T., Uyun, S., & Sugiantoro, B. (2023). Data Search Process Optimization using Brute Force and Genetic Algorithm Hybrid Method. IJID (International Journal on Informatics for Development), 11(2), 222–231. https://doi.org/10.14421/ijid.2022.3743 DOI: https://doi.org/10.14421/ijid.2022.3743

Rizwan, A., Iqbal, N., Ahmad, R., & Kim, D. H. (2021). Wr-svm model based on the margin radius approach for solving the minimum enclosing ball problem in support vector machine classification. Applied Sciences (Switzerland), 11(10). https://doi.org/10.3390/app11104657 DOI: https://doi.org/10.3390/app11104657

Rosyadi, H. I., & Ali, M. (2020). Biomonitoring makrozoobentos sebagai indikator kualitas air sungai. Envirotek: Jurnal Ilmiah Teknik Lingkungan, 12(1), 11-18. https://doi.org/10.33005/envirotek.v12i1.43 DOI: https://doi.org/10.33005/envirotek.v12i1.43

Saidi, R., Bouaguel, W., & Essoussi, N. (2019). Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. In Studies in Computational Intelligence (Vol. 801, pp. 3–24). Springer Verlag. https://doi.org/10.1007/978-3-030-02357-7_1 DOI: https://doi.org/10.1007/978-3-030-02357-7_1

Sakaa, B., Elbeltagi, A., Boudibi, S., Chaffaï, H., Islam, A. R. M. T., Kulimushi, L. C., & Wong, Y. J. (2022). Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environmental Science and Pollution Research, 29(32), 48491–48508. https://doi.org/10.1007/s11356-022-18644-x DOI: https://doi.org/10.1007/s11356-022-18644-x

Santoso, T., Sutanto, A., & Achyani, A. (2021). Keanekaragaman Makrozoobentos Sebagai Bioindikator Kualitas Air Di Danau Asam Suoh Lampung Barat. Bioedukasi (Jurnal Pendidikan Biologi), 12(2), 213-220. http://dx.doi.org/10.24127/bioedukasi.v12i2.4450 DOI: https://doi.org/10.24127/bioedukasi.v12i2.4450

Spearman, C. (1904). The Proof and Measurement of Association between Two Things. American Journal of Psychology, 15, 45-58 https://psycnet.apa.org/doi/10.1037/11491-005 DOI: https://doi.org/10.2307/1412159

Su, J., Wang, X., Zhao, S., Chen, B., Li, C., & Yang, Z. (2015). A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll a in Reservoirs. Water (Switzerland), 7(4), 1610–1627. https://doi.org/10.3390/W7041610 DOI: https://doi.org/10.3390/w7041610

Wu, J., & Wang, Z. (2022). A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water (Switzerland), 14(4). https://doi.org/10.3390/w14040610 DOI: https://doi.org/10.3390/w14040610

Zheng, Z., Jiang, Y., Zhang, Q., Zhong, Y., & Wang, L. (2024). A Feature Selection Method Based on Relief Feature Ranking with Recursive Feature Elimination for the Inversion of Urban River Water Quality Parameters Using Multispectral Imagery from an Unmanned Aerial Vehicle. Water (Switzerland), 16(7). https://doi.org/10.3390/w16071029 DOI: https://doi.org/10.3390/w16071029

Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2), 107-116. https://doi.org/10.1016/j.eehl.2022.06.001 DOI: https://doi.org/10.1016/j.eehl.2022.06.001

Author Biographies

Yudha Riwanto, Universitas Amikom Yogyakarta

Author Origin : Indonesia

Fauzia Anis Sekar Ningrum, Universitas Amikom Yogyakarta

Author Origin : Indonesia

Downloads

Download data is not yet available.

How to Cite

Riwanto, Y., & Ningrum, F. A. S. (2025). Dimensionality Reduction in River Water Quality Classification Using Genetic Algorithm and Correlation-Based Feature Selection. Jurnal Penelitian Pendidikan IPA, 11(9), 751–758. https://doi.org/10.29303/jppipa.v11i9.11863