Publication:
Improving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tags

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Research Projects

Organizational Units

Journal Issue

Abstract

This study presents a novel translation framework by combining transfer learning and part-of-speech (POS) tagging methods to improve the performance of low-resource neural machine translation models using Kazakh-English and Turkish-English language pairs. It is aimed to maximize the effectiveness of transfer learning by taking advantage of the structural similarities of Turkish and Kazakh languages and to obtain more accurate and consistent translation results by integrating grammatical and syntactic information into the model with POS tags. For Kazakh, POS tags were generated using the RoBERTa model, while for Turkish, the Zemberek library was employed, and these tags were used as an additional feature in Transformer-based models. The findings show that using transfer learning and POS tags alone increases the performance, but when these two methods are used together, more meaningful and consistent results are obtained. The results obtained in the study are examined with BLEU, chrF, and METEOR metrics, and detailed analyses are made. The models created for the Kazakh-English translation direction are compared with different models, and it is seen that much better results are obtained with the methods used. For the Turkish-English translation direction, the results were examined using Tatoeba and TED2020 corpora of different sizes. In particular, significant improvements were observed in the experiments conducted on the Tatoeba corpus, and significant increases were obtained in the examined metrics. In this context, the methods applied in the study achieved successful results for low-resource languages, and the generalizability of the proposed approach was demonstrated with the use of different corpora and language pairs.

Description

Kiliç, Erdal/0000-0003-1585-0991; Yazar, Bilge Kağan/0000-0003-2149-142X

Citation

WoS Q

Q2

Scopus Q

Q1

Source

IEEE Access

Volume

13

Issue

Start Page

32341

End Page

32356

Endorsement

Review

Supplemented By

Referenced By