Hyperparameters Optimization in XGBoost Model for Rainfall Estimation: A Case Study in Pontianak City

Authors

Auriwan Yasper , Djati Handoko , Maulana Putra , Harry Kasuma Aliwarga , Mohammad Syamsu Rosid Rosid

DOI:

10.29303/jppipa.v9i9.3890

Published:

2023-09-25

Issue:

Vol. 9 No. 9 (2023): September

Keywords:

GridSearchCV, Hyperparameter, Rainfall, RandomizedSearchCV, XGBoost

Research Articles

Downloads

How to Cite

Yasper, A., Handoko, D., Putra, M. ., Aliwarga, H. K. ., & Rosid, M. S. R. (2023). Hyperparameters Optimization in XGBoost Model for Rainfall Estimation: A Case Study in Pontianak City. Jurnal Penelitian Pendidikan IPA, 9(9), 7113–7121. https://doi.org/10.29303/jppipa.v9i9.3890

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract

Estimating rainfall accurately is crucial for both the community and various institutions involved in managing water resources and preventing disasters. The XGBoost model has demonstrated its effectiveness in predicting rainfall, but it still requires fine-tuning of hyperparameters to enhance its performance. This study seeks to determine the optimal learning rate for rainfall prediction while keeping the max_depth and n_estimator parameters fixed. The hyperparameter optimization process was carried out using a two-step approach: an initial coarse search using RandomizedSearchCV followed by a more detailed fine-tuning using GridSearchCV. The model's foundation relied on historical rainfall data gathered over three months from the Automated Weather Observed System (AWOS) at the Pontianak Meteorological Station, recorded on an hourly basis. To assess the model's performance, several metrics were employed, including accuracy, precision, recall, F1 score, and ROC-AUC. The model demonstrated promising results, with accuracy, precision, recall, and F1 score all reaching 95%, indicating its ability to effectively predict rainfall. However, the ROC-AUC score was somewhat lower at 62%. After conducting the hyperparameter search, the optimal learning rate determined for the model, utilizing the 2040 dataset, was found to be 0.204.

References

Agata, R., & Jaya, I. G. N. M. (2019). A comparison of extreme gradient boosting, SARIMA, exponential smoothing, and neural network models for forecasting rainfall data. Journal of Physics: Conference Series, 1397(1). https://doi.org/10.1088/1742-6596/1397/1/012073

Anand, A., & Kannan, S. R. (2022). Rain/no-rain classification from combined radar- Radiometer data using machine learning. Remote Sensing Applications: Society and Environment, 25. https://doi.org/10.1016/j.rsase.2021.100682

Anwar, M. T., Winarno, E., Hadikurniawati, W., & Novita, M. (2021). Rainfall prediction using Extreme Gradient Boosting. Journal of Physics: Conference Series, 1869(1). https://doi.org/10.1088/1742-6596/1869/1/012078

Ayasha, N., Ryan, M., & Fadlan, A. (2020). Study of atmosphere dynamics in the event of very heavy rain causing flood in Supadio International Airport Pontianak using WRF-ARW Model and Himawari-8 Satellite Imagery (Case study: November 11, 2017). IOP Conference Series: Earth and Environmental Science, 561(1). https://doi.org/10.1088/1755-1315/561/1/012032

Azizah, M., Yanuar, A., & Firdayani, F. (2022). Dimensional Reduction of QSAR Features Using a Machine Learning Approach on the SARS-Cov-2 Inhibitor Database. Jurnal Penelitian Pendidikan IPA, 8(6), 3095–3101. https://doi.org/10.29303/jppipa.v8i6.2432

Bansal, N., Singh, D., & Kumar, M. (2023). Computation of energy across the type-C piano key weir using gene expression programming and extreme gradient boosting (XGBoost) algorithm. Energy Reports, 9, 310–321. https://doi.org/10.1016/j.egyr.2023.04.003

Canayaz, M. (2021). C+EffxNet: A novel hybrid approach for COVID-19 diagnosis on CT images based on CBAM and EfficientNet. Chaos, Solitons and Fractals, 151. https://doi.org/10.1016/j.chaos.2021.111310

Dahouda, M. K., & Joe, I. (2021). A Deep-Learned Embedding Technique for Categorical Features Encoding. IEEE Access, 9, 114381–114391. https://doi.org/10.1109/ACCESS.2021.3104357

Dalal, S., Seth, B., Radulescu, M., Secara, C., & Tolea, C. (2022). Predicting Fraud in Financial Payment Services through Optimized Hyper-Parameter-Tuned XGBoost Model. Mathematics, 10(24). https://doi.org/10.3390/math10244679

Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, 340–341, 250–261. https://doi.org/10.1016/j.ins.2016.01.033

Depto, D. S., Rizvee, M. M., Rahman, A., Zunair, H., Rahman, M. S., & Mahdy, M. R. C. (2023). Quantifying imbalanced classification methods for leukemia detection. Computers in Biology and Medicine, 152. https://doi.org/10.1016/j.compbiomed.2022.106372

Erjavac, I., Kalafatovic, D., & Mauša, G. (2022). Coupled encoding methods for antimicrobial peptide prediction: How sensitive is a highly accurate model? Artificial Intelligence in the Life Sciences, 2, 100034. https://doi.org/10.1016/j.ailsci.2022.100034

Feng, Y., Duan, Q., Chen, X., Yakkali, S. S., & Wang, J. (2021). Space cooling energy usage prediction based on utility data for residential buildings using machine learning methods. Applied Energy, 291. https://doi.org/10.1016/j.apenergy.2021.116814

Ferijal, T., Batelaan, O., & Shanafield, M. (2021). Spatial and temporal variation in rainy season droughts in the Indonesian Maritime Continent. Journal of Hydrology, 603. https://doi.org/10.1016/j.jhydrol.2021.126999

Hasan, M. K., Jawad, M. T., Dutta, A., Awal, M. A., Islam, M. A., Masud, M., & Al-Amri, J. F. (2021). Associating Measles Vaccine Uptake Classification and its Underlying Factors Using an Ensemble of Machine Learning Models. IEEE Access, 9, 119613–119628. https://doi.org/10.1109/ACCESS.2021.3108551

Herawati, H., Suripin, & Suharyanto. (2015). Impact of climate change on streamflow in the tropical Lowland of Kapuas River, West Borneo, Indonesia. Procedia Engineering, 125, 185–192. https://doi.org/10.1016/j.proeng.2015.11.027

Jakka, A., & Vakula Rani, J. (2019). Performance evaluation of machine learning models for diabetes prediction. International Journal of Innovative Technology and Exploring Engineering, 8(11), 1976–1980. https://doi.org/10.35940/ijitee.K2155.0981119

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0192-5

Kapoor, S., & Perrone, V. (2021). A Simple and Fast Baseline for Tuning Large XGBoost Models. http://arxiv.org/abs/2111.06924

Kaushik, S., & Birok, R. (2021). Heart Failure prediction using Xgboost algorithm and feature selection using feature permutation. 2021 4th International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1-6. https://doi.org/10.1109/ICECCT52121.2021.9616626

Kavzoglu, T., & Teke, A. (2022). Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bulletin of Engineering Geology and the Environment, 81(5). https://doi.org/10.1007/s10064-022-02708-w

Lee, H. H., Tang, Y., Bao, S., Abramson, R. G., Huo, Y., & Landman, B. A. (2018). Rap-Net: Coarse-To-Fine Multi-Organ Segmentation With Single Random Anatomical Prior. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 1491-1494. Retrieved from https://github.com/MASILab/coarse_to_fine_prior_seg.

Li, S., & Zhang, X. (2020). Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Computing and Applications, 32(7), 1971–1979. https://doi.org/10.1007/s00521-019-04378-4

Li, X., Shan, G., & Shek, C. H. (2022). Machine learning prediction of magnetic properties of Fe-based metallic glasses considering glass forming ability. Journal of Materials Science and Technology, 103, 113–120. https://doi.org/10.1016/j.jmst.2021.05.076

Ma, Z., Chang, H., Sun, Z., Liu, F., Li, W., Zhao, D., & Chen, C. (2020). Very Short-Term Renewable Energy Power Prediction Using XGBoost Optimized by TPE Algorithm. 2020 4th International Conference on HVDC, HVDC 2020, 1236–1241. https://doi.org/10.1109/HVDC50696.2020.9292870

Muhsi, M., Suprapto, S., & Rofiuddin, R. (2023). Node Selection Method for Split Attribute in C4.5 Algorithm Using the Coefficient of Determination Values for Multivariate Data Set. Jurnal Penelitian Pendidikan IPA, 9(7), 5574–5583. https://doi.org/10.29303/jppipa.v9i7.4031

Navas, J. (2022, February 8). What is hyperparameter tuning Anyscale. Anyscale. Retrieved from https://www.anyscale.com/blog/what-is-hyperparameter-tuning

Nguyen, H., Vu, T., Vo, T. P., & Thai, H. T. (2021). Efficient machine learning models for prediction of concrete strengths. Construction and Building Materials, 266. https://doi.org/10.1016/j.conbuildmat.2020.120950

Palamakumbura, R., Finlayson, A., Ciurean, R., Nedumpallile-Vasu, N., Freeborough, K., & Dashwood, C. (2021). Geological and geomorphological influences on a recent debris flow event in the Ice-scoured Mountain Quaternary domain, western Scotland. Proceedings of the Geologists’ Association, 132(4), 456–468. https://doi.org/10.1016/j.pgeola.2021.05.002

Pham, K., Kim, D., Le, C. V., & Choi, H. (2022). Dual tree-boosting framework for estimating warning levels of rainfall-induced landslides. Landslides, 19(9), 2249–2262. https://doi.org/10.1007/s10346-022-01894-8

Qin, C., Zhang, Y., Bao, F., Zhang, C., Liu, P., & Liu, P. (2021). XGBoost optimized by adaptive particle swarm optimization for credit scoring. Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/6655510

Ramadhan, R., Marzuki, M., Yusnaini, H., Ningsih, A. P., Hashiguchi, H., Shimomai, T., Vonnisa, M., Ulfah, S., Suryanto, W., & Sholihun, S. (2022). Ground Validation of GPM IMERG-F Precipitation Products with the Point Rain Gauge Records on the Extreme Rainfall Over a Mountainous Area of Sumatra Island. Jurnal Penelitian Pendidikan IPA, 8(1), 163–170. https://doi.org/10.29303/jppipa.v8i1.1155

Shahani, N. M., Kamran, M., Zheng, X., Liu, C., & Guo, X. (2021). Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar coalfield. Advances in Civil Engineering, 2021, 1-19. https://doi.org/10.1155/2021/2565488

Tankari, M. R. (2020). Rainfall variability and farm households’ food insecurity in Burkina Faso: nonfarm activities as a coping strategy. Food Security, 12, 567–578. https://doi.org/10.1007/s12571-019-01002-0/Published

Wang, X., Xia, J., Zhou, M., Deng, S., & Li, Q. (2022). Assessment of the joint impact of rainfall and river water level on urban flooding in Wuhan City, China. Journal of Hydrology, 613. https://doi.org/10.1016/j.jhydrol.2022.128419

Wardani, A., Akbar, A. J., Handayani, L., & Lubis, A. M. (2023). Correlation Among Rainfall, Humidity, and The El Niño-Southern Oscillation (ENSO) Phenomena in Bengkulu City During the Period from 1985-2020. Jurnal Penelitian Pendidikan IPA, 9(4), 1664–1671. https://doi.org/10.29303/jppipa.v9i4.2971

Xiang, Y., Gou, L., He, L., Xia, S., & Wang, W. (2018). A SVR–ANN combined model based on ensemble EMD for rainfall prediction. Applied Soft Computing Journal, 73, 874–883. https://doi.org/10.1016/j.asoc.2018.09.018

Yu, Y., Zhu, J., Gao, T., Liu, L., Yu, F., Zhang, J., & Wei, X. (2022). Evaluating the influential variables on rainfall interception at different rainfall amount levels in temperate forests. Journal of Hydrology, 615. https://doi.org/10.1016/j.jhydrol.2022.128572

Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access, 8, 220990-221003. https://doi.org/10.1109/ACCESS.2020.3042848

Zhang, Y. (2022). Classification of Quasars, Galaxies, and Stars by Using XGBoost in SDSS-DR16. Proceedings - 2022 International Conference on Machine Learning and Knowledge Engineering, MLKE 2022, 266–272. https://doi.org/10.1109/MLKE55170.2022.00058

Zhang, Y., Liu, Y., Wang, Y., & Yang, J. (2023). An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network. Chemometrics and Intelligent Laboratory Systems, 235. https://doi.org/10.1016/j.chemolab.2023.104775

Zhou, M., Wang, L., Wu, H., Li, Q., Li, M., Zhang, Z., Zhao, Y., Lu, Z., & Zou, Z. (2022). Machine learning modeling and prediction of peanut protein content based on spectral images and stoichiometry. LWT, 169. https://doi.org/10.1016/j.lwt.2022.114015

Author Biographies

Auriwan Yasper, Universitas Indonesia

Djati Handoko, Universitas Indonesia

Maulana Putra, Indonesia Agency for Meteorology Climatology and Geophysics

Harry Kasuma Aliwarga, UMG Idealabs Indonesia

Mohammad Syamsu Rosid Rosid, Universitas Indonesia

License

Copyright (c) 2023 Auriwan Yasper, Djati Handoko, Maulana Putra, Harry Kasuma Aliwarga, Mohammad Syamsu Rosid Rosid

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
  2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
  3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).