Dimensional Reduction of QSAR Features Using a Machine Learning Approach on the SARS-Cov-2 Inhibitor Database
AuthorsMunaya Azizah , Arry Yanuar , Firdayani Firdayani
Issue:Vol. 8 No. 6 (2022): December
Keywords:QSAR, PCA, Missing values, Random forest, SARS-Cov-2
Articles "Regular Issue"
How to Cite
Quantitative Structure-Activity Relationship (QSAR) is a method that relates the chemical composition of a molecule to its biochemical, pharmaceutical and biological activities. The characteristics of a molecule's chemical constituents, such as chemical descriptors and fingerprints, are necessary to create a good QSAR model. Dimensionality reduction can alleviate the issue of several unnecessary and redundant chemical descriptors and chemical fingerprints in a high-dimensional feature-number data set by shrinking the high-dimensional original space to a low-dimensional intrinsic space. There are two categories of dimensional reduction techniques: feature extraction and feature selection. The dimension reduction approach can be utilized as a starting step in running a QSAR Virtual Screening Model on a dataset of SARS-CoV-2 inhibitor medications to create novel treatments for Covid-19 cases based on machine learning (ML) and the idea of medicinal repurposing. Fe extraction and feature selection are crucial to determining which feature sets should be applied to a specific classification process in QSAR modeling to produce reliable virtual screening results. The SARS-Cov-2 inhibitor drug database's chemical descriptor and chemical fingerprint were extracted using a simple, quick, and accurate method in this work. The total number of selected features is 12122 features. PCA, Missing values, and Random Forest are the techniques employed. The Xgboost Tree Ensemble, Naive Bayes, Support Vector Machine, Random Forest, and Deep Learning (Artificial Neural Network/Multilayer Perceptron) were used to classify the QSAR modeling on the training and test data. The Random Forest approach, when applied to all chemical descriptors and chemical fingerprint features, along with the XGBoost algorithm, yields the best feature selection results (accuracy value of 0.845 and AUC of 0.904). There are 233 characteristics for the regression QSAR approach and 273 features for the feature selection-based QSAR method of classification. Next, virtual screening of QSAR modeling of prospective drugs for Covid-19 therapy can be done utilizing the outcomes of the characteristics that have been chosen using the Random Forest approach
Attiq, N., Arshad, U., Brogi, S., Shafiq, N., Imtiaz, F., Parveen, S., Rashid, M., & Noor, N. (2022). International Journal of Biological Macromolecules Exploring the anti-SARS-CoV-2 main protease potential of FDA approved marine drugs using integrated machine learning templates as predictive tools. International Journal of Biological Macromolecules, 220(September), 1415–1428. https://doi.org/10.1016/j.ijbiomac.2022.09.086
Bastikar, V., Bastikar, A., & Gupta, P. (2022). Chapter 10 - Quantitative structure–activity relationship-based computational approaches. In A. Parihar, R. Khan, A. Kumar, A. K. Kaushik, & H. Gohel (Eds.), Computational Approaches for Novel Therapeutic and Diagnostic Designing to Mitigate SARS-CoV-2 Infection (pp. 191–205). Academic Press. https://doi.org/https://doi.org/10.1016/B978-0-323-91172-6.00001-7
Bender, A., & Cortés-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. Drug Discovery Today, 26(2), 511–524. https://doi.org/10.1016/j.drudis.2020.12.009
Cavasotto, C. N., & Di Filippo, J. I. (2021). Artificial intelligence in the early stages of drug discovery. Archives of Biochemistry and Biophysics, 698(November 2020), 108730. https://doi.org/10.1016/j.abb.2020.108730
DiMasi, J. A., Grabowski, H. G., & Hansen, R. W. (2016). Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics, 47, 20–33. https://doi.org/10.1016/j.jhealeco.2016.01.012
Erlina, L., Paramita, R. I., Kusuma, W. A., Fadilah, F., Tedjo, A., Pratomo, I. P., Ramadhanti, N. S., Nasution, A. K., Surado, F. K., Fitriawan, A., Istiadi, K. A., & Yanuar, A. (2020). Virtual Screening on Indonesian Herbal Compounds as COVID-19 SupportiveTherapy: Machine Learning and Pharmacophore Modeling Approaches. May. https://doi.org/10.21203/rs.3.rs-29119/v1
García, R., Hussain, A., Koduru, P., Atis, M., Wilson, K., Park, J. Y., Toby, I., Diwa, K., Vu, L., Ho, S., Adnan, F., Nguyen, A., Cox, A., Kirtek, T., García, P., Li, Y., Jones, H., Shi, G., Green, A., & Rosenbaum, D. (2021). Identification of potential antiviral compounds against SARS-CoV-2 structural and non structural protein targets : A pharmacoinformatics study of the CAS COVID-19 dataset. Computers in Biology and Medicine, 133, 104364. https://doi.org/10.1016/j.compbiomed.2021.104364
Ishola, A. A., Adedirin, O., Joshi, T., & Chandra, S. (2021). QSAR modeling and pharmacoinformatics of SARS coronavirus 3C-like protease inhibitors. Computers in Biology and Medicine, 134, 104483. https://doi.org/https://doi.org/10.1016/j.compbiomed.2021.104483
Jain, R., & Xu, W. (2021). RHDSI: A novel dimensionality reduction based algorithm on high dimensional feature selection with interactions. Information Sciences, 574, 590–605. https://doi.org/https://doi.org/10.1016/j.ins.2021.06.096
Janeiro, R. De. (2018). Classification Models Based on Machine Learning for The Prediction of mPGES-1 Inhibitor. 309, 7–8.
Jasial, S., Hu, Y., Vogt, M., & Bajorath, J. (2016). Activity-relevant similarity values for fingerprints and implications for similarity searching. F1000Research, 5. https://doi.org/10.12688/f1000research.8357.2
Kabir, M. F., Chen, T., & Ludwig, S. A. (2022). A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction. Healthcare Analytics, 3, 100125. https://doi.org/https://doi.org/10.1016/j.health.2022.100125
Li, J., Luo, D., Wen, T., Liu, Q., & Mo, Z. (2021). Representative feature selection of molecular descriptors in QSAR modeling. Journal of Molecular Structure, 1244, 131249. https://doi.org/https://doi.org/10.1016/j.molstruc.2021.131249
Li, M., Wang, H., Yang, L., Liang, Y., Shang, Z., & Wan, H. (2020). Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Expert Systems with Applications, 150, 113277. https://doi.org/https://doi.org/10.1016/j.eswa.2020.113277
Li, P., Zhang, W., Lu, C., Zhang, R., & Li, X. (2022). Robust kernel principal component analysis with optimal mean. Neural Networks, 152, 347–352. https://doi.org/https://doi.org/10.1016/j.neunet.2022.05.005
Mendes Junior, J. J. A., Freitas, M. L. B., Siqueira, H. V, Lazzaretti, A. E., Pichorim, S. F., & Stevan, S. L. (2020). Feature selection and dimensionality reduction: An extensive comparison in hand gesture classification by sEMG in eight channels armband approach. Biomedical Signal Processing and Control, 59, 101920. https://doi.org/https://doi.org/10.1016/j.bspc.2020.101920
Paul, D., Sanap, G., Shenoy, S., Kalyane, D., Kalia, K., & Tekade, R. K. (2021). Artificial intelligence in drug discovery and development. Drug Discovery Today, 26(1), 80–93. https://doi.org/10.1016/j.drudis.2020.10.010
Rajput, A., Thakur, A., Mukhopadhyay, A., Kamboj, S., Rastogi, A., Gautam, S., Jassal, H., & Kumar, M. (2021). Prediction of repurposed drugs for Coronaviruses using artificial intelligence and machine learning. Computational and Structural Biotechnology Journal, 19, 3133–3148. https://doi.org/10.1016/j.csbj.2021.05.037
Roy, K., Kar, S., & Das, R. N. (2015). QSAR/QSPR Modeling: Introduction. In A Primer on QSAR/QSPR Modeling: Fundamental Concepts (pp. 1–36). Springer International Publishing. https://doi.org/10.1007/978-3-319-17281-1_1
WHO. (2022). https://covid19.who.int/WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data, 2022. Diakses pada tanggal 27 Oktober 2022.
Xue, H., Li, J., Xie, H., & Wang, Y. (2018). Review of drug repositioning approaches and resources. International Journal of Biological Sciences, 14(10), 1232–1244. https://doi.org/10.7150/ijbs.24612
Zhang, H., Zhang, T., Saravanan, K. M., Liao, L., Wu, H., Zhang, H., Zhang, H., Pan, Y., Wu, X., & Wei, Y. (2022). DeepBindBC: A practical deep learning method for identifying native-like protein-ligand complexes in virtual screening. Methods, 205, 247–262. https://doi.org/https://doi.org/10.1016/j.ymeth.2022.07.009
Copyright (c) 2022 Munaya Azizah, Arry Yanuar, Firdayani Firdayani
This work is licensed under a Creative Commons Attribution 4.0 International License.Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).