A Banjarnese Corpus Generation Method Based on Contextual Synonym Substitution Using Identic.v1.0 Data

Ali Muhammad; Novia Winda; Budi Jejen Zaenal Abidin

doi:10.29303/jppipa.v12i2.14393

Vol. 12 No. 2 (2026)

Open Access

Peer Reviewed

A Banjarnese Corpus Generation Method Based on Contextual Synonym Substitution Using Identic.v1.0 Data

Authors

Ali Muhammad , Novia Winda , Budi Jejen Zaenal Abidin

DOI:

10.29303/jppipa.v12i2.14393

Published:

2026-02-28

Downloads

PDF

Abstract

The preservation and revitalization of the Banjar language is urgently needed. The decreasing number of Banjar language speakers and linguistic experts due to aging factors, combined with the hegemony of dominant languages brought by migrants, has become a major challenge in the preservation and revitalization of the Banjar language. This study aims to generate method for generating a Banjar language corpus by increasing the accuracy of sentence translation without leaving the original sentence context. This study uses a translation method of paraphrase contextual synonym substitution. This study used parallel corpus data Identic.v1.0. This method was tested and compared with statistical machine translation methods using Meteor universal tools, statistic evaluation and by human judgment. The statistical evaluation results indicate that the proposed method yielded a significant improvement in translation performance compared to the statistical machine translation method. Translation accuracy increased from 48% with the statistical method to 81% with the proposed method, representing a performance improvement of 33 percentage points, or approximately 68.75% relative to the statistical method. Meanwhile, the naturalness test of translated sentences using meteor universal tools with 1000 random sentences data shows that the proposed method is better than the previous method. The results or final score of naturalness sentences using proposed method are 0.6, while the final score of translating results using the statistical machine translation method is 0.36. Finally, the sentences evaluated by human judgment involving 15 language observers. The evaluated results show that the translated sentences using the proposed method is 75.8% more better than the statistical machine translation method.

Keywords:

Contextual synonym substitution Corpus generation methods Human minimal resources Translation methods

References

Álvarez-carmona, M. Á., Aranda, R., Rodríguez-gonzalez, A. Y., Fajardo-delgado, D., Guadalupe, M., Pérez-espinosa, H., Martínez-miranda, J., Guerrero-rodríguez, R., Bustio-martínez, L., & Díaz-pacheco, Á. (2022). Natural language processing applied to tourism research : A systematic review and future research directions. Journal of King Saud University – Computer and Information Sciences Xxx, xxx(xxxx), xxx. https://doi.org/10.1016/j.jksuci.2022.10.010

Aqlan, A. A. Q., Manjula, B., & Naik, R. L. (2019). A Study of Sentiment Analysis : Concepts , Techniques , and Challenges. Proceedings OfInternational Conference on Computational Intelligence and Data Engineering, Lecture Notes on Data Engineering and Communications Technologies 28, 147–162. https://doi.org/10.1007/978-981-13-6459-4

Banerjee, S., & Lavie, A. (2005). METEOR : An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization.

Barmawi, A. M., & Muhammad, A. (2019). Paraphrasing method based on contextual synonym substitution. Journal of ICT Research and Applications, 13(3), 257–282. https://doi.org/10.5614/itbj.ict.res.appl.2019.13.3.6

Barmawi, A. M., Wahyudi, B. A., & Pristi, T. (2023). Linguistic Based One Time Password. International Journal on Electrical Engineering and Informatics -, 15(1), 1–16. https://doi.org/10.15676/ijeei.2023.15.1.1

Fashwan, A., & Alansary, S. (2021). A Morphologically Annotated Corpus and a Morphological. Procedia Computer Science 189, 203–210. https://doi.org/10.1016/j.procs.2021.05.084

Gadag, A. I., & Sagar, B. M. (2016). N-gram Based Paraphrase Generator from Large Text Document. 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), 91–94.

Ginting, N. D. B., Sinaga, L. A., Ginting, A. S. B., & Surip, M. (2025). Pergeseran Bahasa Indonesia di Kalangan Remaja di Era Globalisasi. Jurnal Multidisiplin Inovatif, 9(3), 124–129.

Guinovart, X. G. (2019). Enriching parallel corpora with multimedia and lexical semantics from the CLUVI Corpus to WordNet and SemCor. John Benjamins Publishing Company, 141–158. https://doi.org/https://doi.org/10.1075/scl.90.09gom

Hapip, A. D. (2007). Kamus Banjar – Indonesia. CV. Rahmat hafiz Al Mubaraq.

Hapsari, W. P., Labib, U. A., Haryanto, H., & Safitri, D. W. (2021). A Literature Review of Human, Organization, Technology (HOT) – Fit Evaluation Model. Proceedings of the 6th International Seminar on Science Education (ISSE 2020), Advances in Social Science, Education and Humanities Research, 541(Isse 2020), 876–883. https://doi.org/10.2991/assehr.k.210326.126

Hasmianti, L., Usman, U., & Amir, J. (2023). Pergeseran Penggunaan Kata Sapaan oleh Generasi Milenial Banjar di Kota Banjarmasin. Jurnal Pendidikan Bahasa Dan Sastra Indonesia, 8(2), 122. https://doi.org/10.26737/jp-bsi.v8i2.4280

Kamariah, Hamidah, J., & Krismanti, N. (2023). Konservasi Bahasa Banjar Sebagai Usaha Pelestarian Bahasa Daerah di Kalimantan Selatan. Bahasa, Sastra & Pengajaran (Konfiks), 10(2), 24. https://journal.unismuh.ac.id/index.php/konfiksPermalink/DOI:https://doi.org/10.26618/jk/13118

Larasati, S. D. (2012). IDENTIC corpus: Morphologically enriched Indonesian-english parallel corpus. Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, 902–906.

Liu, B., & Huang, L. (2021). ParaMed : a parallel corpus for English – Chinese translation in the biomedical domain. BMC Medical Informatics and Decision Making, 1–11. https://doi.org/10.1186/s12911-021-01621-8

Lopez, A. (2023). Machine Translation evaluation metrics benchmarking : From traditional MT to LLMs. In Universitat De Barcelona Fundamental (1st ed.). Facultat de Matemàtiques i Informàtica, Universitat De Barcelona.

Mohammed, T. A. S. (2022). The Use of Corpora in Translation Into the Second Language: A Project-Based Approach. Frontiers in Education, 7(April), 1–14. https://doi.org/10.3389/feduc.2022.849056

Muhammad, A., & Kamariah, K. (2020). Pengurai Kalimat Bahasa Banjar Dengan Menggunakan Parser PC-PATR. Jurnal Linguistik Komputasional (JLK), 3(1), 20. https://doi.org/10.26418/jlk.v3i1.30

Muhammad, A., & Widyastuti, N. (2024). Pengembangan Aplikasi Part-of-Speech Tagger Bahasa Banjar Menggunakan Metode Pengembangan DevOps. JIKOMTI: Jurnal Ilmiah Ilmu Komputer Dan Teknologi Informasi, 1(1).

Muhammad, A., Winda, N., Firizkiansah, A., Setiawan, D., Dewi, S. H. F., Rizki, I. M., & Ardiansyah, M. (2025). Review of Banjarnese Neural Machine Translation Development With Minimal Resources. Journal of Software Engineering, Information and Communication Technology (SEICT) 6(1), 6(1)(June), 33–42. https://doi.org/https://doi.org/10.17509/seict.v6i1.86768

Muttaqin, A. I. (2019). Konstruksi Verba Gerak Direksional dalam Bahasa Banjar. PRASASTI: Journal of Linguistics, 4(2), 99–103. https://jurnal.uns.ac.id/pjl/article/view/34129

Nur, S., Assyifa, A. N., & Nurjannah, H. (2023). Pengembangan Aplikasi Penerjemah Bahasa Isyarat Indonesia (Bisindo) Menggunakan Metode Long-Short Term Memory. EDUSAINTEK: Jurnal Pendidikan, Sains Dan Teknologi, 11(1), 13–30. https://doi.org/10.47668/edusaintek.v11i1.898

Oliver, A. (2024). LitPC : A set of tools for building parallel corpora from literary works. Proceedings Ofthe 1st Workshop on Creative-Text Translation and Technology, European Association for Machine Translation, 21–31.

Pan, B., & Qin, Q. (2022). Construction of parallel corpus for english translation teaching based on computer aided translation software. Computer-Aided Design and Applications, 19(s1), 70–80. https://doi.org/10.14733/CADAPS.2022.S1.70-80

Prabowo, A., & Indra Sanjaya, F. (2024). Penerapan Metode Transfer Learning Pada Indobert Untuk Analisis Sentimen Teks Bahasa Jawa Ngoko Lugu. Jurnal Sistem Informasi Dan Sistem Komputer, 9(2), 205–217. https://doi.org/10.51717/simkom.v9i2.478

Rui, L., & Xiuli, G. (2022). Basic Research on Construction of Multimodal Parallel Corpus of Tourism Translation in New Media Era. Academic Journal of Humanities & Social Sciences, 5(15), 139–144. https://doi.org/10.25236/ajhss.2022.051519

Shen, N. (2022). English-Chinese Corpus Collection and Translation Wisdom Algorithm Implementation Based on Ajax+JQuery. International Journal of Science and Engineering Applications, 11(12), 300–302. https://doi.org/10.7753/ijsea1112.1015

Spatioti, A. G., Kazanidis, I., & Pange, J. (2022). A Comparative Study of the ADDIE Instructional Design Model in Distance Education. Information 2022, 13, 1–22.

Sudibyo, B. (2008). Tesaurus Bahasa Indonesia Pusat Bahasa. Departemen Pendidikan Nasional.

Team. (2025). Si Palui. Banjarmasin Post.

Winda, N., & Muhammad, A. (2023). Pengembangan Parsing PCPATR sebagai Preservasi Bahasa dan Sastra Banjar. In Jurnal Onoma: Pendidikan, Bahasa dan Sastra (Vol. 9, Issue 2). Pendidikan. https://e-journal.my.id/onoma

Author Biographies

Ali Muhammad, Universitas Sains Indonesia

Author Origin : Indonesia

Informatics Engineering Study Program, Faculty of Computer Science

Novia Winda, Universitas PGRI Kalimantan

Author Origin : Indonesia

Indonesian Language and Literature Education, Faculty of Social and Humanities

Budi Jejen Zaenal Abidin, Universitas Sains Indonesia

Author Origin : Indonesia

Information Systems Study Program, Faculty of Computer Science

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with Jurnal Penelitian Pendidikan IPA, agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC-BY License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in Jurnal Penelitian Pendidikan IPA.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Downloads

Download data is not yet available.

How to Cite

Muhammad, A., Winda, N., & Abidin, B. J. Z. (2026). A Banjarnese Corpus Generation Method Based on Contextual Synonym Substitution Using Identic.v1.0 Data . Jurnal Penelitian Pendidikan IPA, 12(2), 487–498. https://doi.org/10.29303/jppipa.v12i2.14393

Download Citation

A Banjarnese Corpus Generation Method Based on Contextual Synonym Substitution Using Identic.v1.0 Data

Authors

DOI:

Published:

Downloads

Abstract

Keywords:

References

Author Biographies

Ali Muhammad, Universitas Sains Indonesia

Novia Winda, Universitas PGRI Kalimantan

Budi Jejen Zaenal Abidin, Universitas Sains Indonesia

License

Downloads

How to Cite

menueditorialteam

template

visitor

journaltools

Contact Us

Information