Publication: Improving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tags
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Abstract
This study presents a novel translation framework by combining transfer learning and part-of-speech (POS) tagging methods to improve the performance of low-resource neural machine translation models using Kazakh-English and Turkish-English language pairs. It is aimed to maximize the effectiveness of transfer learning by taking advantage of the structural similarities of Turkish and Kazakh languages and to obtain more accurate and consistent translation results by integrating grammatical and syntactic information into the model with POS tags. For Kazakh, POS tags were generated using the RoBERTa model, while for Turkish, the Zemberek library was employed, and these tags were used as an additional feature in Transformer-based models. The findings show that using transfer learning and POS tags alone increases the performance, but when these two methods are used together, more meaningful and consistent results are obtained. The results obtained in the study are examined with BLEU, chrF, and METEOR metrics, and detailed analyses are made. The models created for the Kazakh-English translation direction are compared with different models, and it is seen that much better results are obtained with the methods used. For the Turkish-English translation direction, the results were examined using Tatoeba and TED2020 corpora of different sizes. In particular, significant improvements were observed in the experiments conducted on the Tatoeba corpus, and significant increases were obtained in the examined metrics. In this context, the methods applied in the study achieved successful results for low-resource languages, and the generalizability of the proposed approach was demonstrated with the use of different corpora and language pairs.
Description
Kiliç, Erdal/0000-0003-1585-0991; Yazar, Bilge Kağan/0000-0003-2149-142X
Citation
WoS Q
Q2
Scopus Q
Q1
Source
IEEE Access
Volume
13
Issue
Start Page
32341
End Page
32356
