Publication:
Improving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tags

dc.authorscopusid57212212990
dc.authorscopusid22953804000
dc.authorwosidKiliç, Erdal/Hjy-2853-2023
dc.contributor.authorYazar, Bilge Kagan
dc.contributor.authorKilic, Erdal
dc.contributor.authorIDKiliç, Erdal/0000-0003-1585-0991
dc.contributor.authorIDYazar, Bilge Kağan/0000-0003-2149-142X
dc.date.accessioned2025-12-11T01:23:32Z
dc.date.issued2025
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Yazar, Bilge Kagan; Kilic, Erdal] Ondokuz Mayis Univ, Fac Engn, TR-55200 Samsun, Turkiyeen_US
dc.descriptionKiliç, Erdal/0000-0003-1585-0991; Yazar, Bilge Kağan/0000-0003-2149-142Xen_US
dc.description.abstractThis study presents a novel translation framework by combining transfer learning and part-of-speech (POS) tagging methods to improve the performance of low-resource neural machine translation models using Kazakh-English and Turkish-English language pairs. It is aimed to maximize the effectiveness of transfer learning by taking advantage of the structural similarities of Turkish and Kazakh languages and to obtain more accurate and consistent translation results by integrating grammatical and syntactic information into the model with POS tags. For Kazakh, POS tags were generated using the RoBERTa model, while for Turkish, the Zemberek library was employed, and these tags were used as an additional feature in Transformer-based models. The findings show that using transfer learning and POS tags alone increases the performance, but when these two methods are used together, more meaningful and consistent results are obtained. The results obtained in the study are examined with BLEU, chrF, and METEOR metrics, and detailed analyses are made. The models created for the Kazakh-English translation direction are compared with different models, and it is seen that much better results are obtained with the methods used. For the Turkish-English translation direction, the results were examined using Tatoeba and TED2020 corpora of different sizes. In particular, significant improvements were observed in the experiments conducted on the Tatoeba corpus, and significant increases were obtained in the examined metrics. In this context, the methods applied in the study achieved successful results for low-resource languages, and the generalizability of the proposed approach was demonstrated with the use of different corpora and language pairs.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1109/ACCESS.2025.3542491
dc.identifier.endpage32356en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-85217895243
dc.identifier.scopusqualityQ1
dc.identifier.startpage32341en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3542491
dc.identifier.urihttps://hdl.handle.net/20.500.12712/43381
dc.identifier.volume13en_US
dc.identifier.wosWOS:001440225500023
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherIEEE-Inst Electrical Electronics Engineers Incen_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectTranslationen_US
dc.subjectTransformersen_US
dc.subjectAccuracyen_US
dc.subjectTransfer Learningen_US
dc.subjectData Modelsen_US
dc.subjectEncodingen_US
dc.subjectTaggingen_US
dc.subjectGrammaren_US
dc.subjectVectorsen_US
dc.subjectSolid Modelingen_US
dc.subjectNeural Machine Translationen_US
dc.subjectLow-Resource Languagesen_US
dc.subjectMulti-Feature Transformeren_US
dc.subjectTransfer Learningen_US
dc.subjectPart of Speech Tagsen_US
dc.titleImproving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tagsen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files