A Novel Hybrid Classification on Urban Opinion Using ROS-RF: A Machine Learning Approach

Authors

Usman Ependi , Nahdatul Akma Ahmad

DOI:

10.29303/jppipa.v10i8.8042

Published:

2024-08-31

Issue:

Vol. 10 No. 8 (2024): August

Keywords:

Covid-19, Machine learning, Lexicon, Sentiment analysis

Research Articles

Downloads

How to Cite

Ependi, U., & Ahmad, N. A. (2024). A Novel Hybrid Classification on Urban Opinion Using ROS-RF: A Machine Learning Approach. Jurnal Penelitian Pendidikan IPA, 10(8), 5816–5824. https://doi.org/10.29303/jppipa.v10i8.8042

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract

Urban opinion from crowdsourced data often leads to imbalanced datasets due to the diversity of issues related to urban social, economic, and environmental topics. This study presents a novel hybrid approach that combines Random Over-Sampling and Random Forest (ROS-RF) to effectively classify such imbalanced data. Using crowdsourced urban opinion data from Jakarta, experimental results show that the ROS-RF method outperforms other approaches. The ROS-RF classifier achieved an impressive F1-score, recall, precision, and accuracy of 98%. These findings highlight the superior effectiveness of the ROS-RF method in classifying urban opinions, especially those related to social, economic, and environmental issues in urban settings. This hybrid approach provides a robust solution for managing imbalanced datasets, ensuring more accurate and reliable classification outcomes. The study underscores the potential of ROS-RF in enhancing urban data analysis and decision-making processes

References

Abdi, A., Shamsuddin, S. M., Hasan, S., & Piran, J. (2019). Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion. Information Processing & Management, 56(4), 1245-1259. https://doi.org/10.1016/j.ipm.2019.02.018.

Castelnovo, W., Misuraca, G., & Savoldelli, A. (2016). Smart cities governance: The need for a holistic approach to assessing urban participatory policy making. Social Science Computer Review, 34(6), 724-739. https://doi.org/10.1177/0894439315611103.

Chen, H., Wu, L., Chen, J., Lu, W., & Ding, J. (2022). A comparative study of automated legal text classification using random forests and deep learning. Information Processing & Management, 59(2), 102798. https://doi.org/10.1016/j.ipm.2021.102798.

Crooks, A., Pfoser, D., Jenkins, A., Croitoru, A., Stefanidis, A., Smith, D., ... & Lamprianidis, G. (2015). Crowdsourcing urban form and function. International Journal of Geographical Information Science, 29(5), 720-741. https://doi.org/10.1080/13658816.2014.977905.

Diniz, M. A. (2022). Statistical methods for validation of predictive models. Journal of Nuclear Cardiology, 29(6), 3248-3255. https://doi.org/10.1007/s12350-022-02994-7.

Ependi, U., Aliya, S., & Wibowo, A. (2023). Sentiment Analysis of Covid-19 Handling in Indonesia Based on Lexicon Weighting. Jurnal Sisfokom (Sistem Informasi dan Komputer), 12(1), 76-82. https://doi.org/10.32736/sisfokom.v12i1.1615.

Ependi, U., Rochim, A. F., & Wibowo, A. (2023a). A Hybrid Sampling Approach for Improving the Classification of Imbalanced Data Using ROS and NCL Methods. International Journal of Intelligent Engineering and Systems, 16(3), 345-361. https://doi.org/10.22266/ijies2023.0630.28.

Ependi, U., Rochim, A. F., & Wibowo, A. (2023b). An assessment model for sustainable cities using crowdsourced data based on general system theory: a design science methodology approach. Smart Cities, 6(6), 3032-3059. https://doi.org/10.3390/smartcities6060136.

Ghahramani, M., Galle, N. J., Duarte, F., Ratti, C., & Pilla, F. (2021). Leveraging artificial intelligence to analyze citizens’ opinions on urban green space. City and Environment Interactions, 10, 100058. https://doi.org/10.1016/j.cacint.2021.100058.

Jalal, N., Mehmood, A., Choi, G. S., & Ashraf, I. (2022). A novel improved random forest for text classification using feature ranking and optimal number of trees. Journal of King Saud University-Computer and Information Sciences, 34(6), 2733-2742. https://doi.org/10.1016/j.jksuci.2022.03.012

Joia, L. A., & Kuhl, A. (2019). Smart city for development: A conceptual model for developing countries. In International conference on social implications of computers in developing countries, 203-214. https://doi.org/10.1007/978-3-030-19115-3_17

Kourtzanidis, K., Angelakoglou, K., Apostolopoulos, V., Giourka, P., & Nikolopoulos, N. (2021). Assessing impact, performance and sustainability potential of smart city projects: Towards a case agnostic evaluation framework. Sustainability, 13(13), 7395. https://doi.org/10.3390/su13137395.

Long, Y., & Liu, L. (2016). Transformations of urban studies and planning in the big/open data era: A review. International Journal of Image and Data Fusion, 7(4), 295-308. https://doi.org/10.1080/19479832.2016.1215355.

Lopez, W., Merlino, J., & Rodriguez-Bocca, P. (2020). Learning semantic information from Internet Domain Names using word embeddings. Engineering Applications of Artificial Intelligence, 94, 103823. https://doi.org/10.1016/j.engappai.2020.103823.

Macrohon, J. J. E., Villavicencio, C. N., Inbaraj, X. A., & Jeng, J. H. (2022). A semi-supervised approach to sentiment analysis of tweets during the 2022 Philippine presidential election. Information, 13(10), 484. https://doi.org/10.3390/info13100484.

Meng, F., Cheng, W., & Wang, J. (2021). Semi-supervised software defect prediction model based on tri-training. KSII Transactions on Internet & Information Systems, 15(11), 40–42. https://doi.org/10.3837/tiis.2021.11.009.

Salles, T., Gonçalves, M., Rodrigues, V., & Rocha, L. (2018). Improving random forests by neighborhood projection for effective text classification. Information Systems, 77, 1-21. https://doi.org/10.1016/j.is.2018.05.006.

Sastrawan, I. K., Bayupati, I. P. A., & Arsa, D. M. S. (2022). Detection of fake news using deep learning CNN–RNN based methods. ICT express, 8(3), 396-408. https://doi.org/10.1016/j.icte.2021.10.003.

Tallo, T. E., & Musdholifah, A. (2018). The implementation of genetic algorithm in smote (synthetic minority oversampling technique) for handling imbalanced dataset problem. In 2018 4th international conference on science and technology (ICST), 1-4. https://doi.org/10.1109/ICSTC.2018.8528591

Tan, S. Y., & Taeihagh, A. (2020). Smart city governance in developing countries: A systematic literature review. sustainability, 12(3), 899. https://doi.org/10.3390/su12030899.

Tomor, Z., Meijer, A., Michels, A., & Geertman, S. (2019). Smart governance for sustainable cities: Findings from a systematic literature review. Journal of urban technology, 26(4), 3-27. https://doi.org/10.1080/10630732.2019.1651178.

Viale Pereira, G., Cunha, M. A., Lampoltshammer, T. J., Parycek, P., & Testa, M. G. (2017). Increasing collaboration and participation in smart city governance: A cross-case analysis of smart city initiatives. Information Technology for Development, 23(3), 526-553. https://doi.org/10.1080/02681102.2017.1353946.

Wang, W., Zhu, X., Lu, P., Zhao, Y., Chen, Y., & Zhang, S. (2024). Spatio-temporal evolution of public opinion on urban flooding: Case study of the 7.20 Henan extreme flood event. International Journal of Disaster Risk Reduction, 100, 104175. https://doi.org/10.1016/j.ijdrr.2023.104175.

Webster, C. W. R., & Leleux, C. (2018). Smart governance: Opportunities for technologically-mediated citizen co-production. Information Polity, 23(1), 95-110. https://doi.org/10.3233/IP-170065.

Zhang, X., & Wang, M. (2021). Weighted random forest algorithm based on bayesian algorithm. In Journal of Physics: Conference Series, 1924(1). https://doi.org/10.1088/1742-6596/1924/1/012006.

Zhong, Yu, & Huiling Wang. (2023). Internet Financial Credit Scoring Models Based on Deep Forest and Resampling Methods. IEEE Access, 11, 8689–8700. https://doi.org/10.1109/ACCESS.2023.3239889.

Author Biographies

Usman Ependi, Universitas Bina Darma

Nahdatul Akma Ahmad, Malaysia

License

Copyright (c) 2024 Usman Ependi, Nahdatul Akma Ahmad

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
  2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
  3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).