Publication:
Using Kolmogorov-Arnold Networks in Transformer Model: A Study on Low-Resource Neural Machine Translation

dc.authorscopusid57212212990
dc.authorscopusid22953804000
dc.authorwosidKiliç, Erdal/Hjy-2853-2023
dc.contributor.authorYazar, Bilge Kagan
dc.contributor.authorKilic, Erdal
dc.date.accessioned2025-12-11T00:38:49Z
dc.date.issued2025
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Yazar, Bilge Kagan; Kilic, Erdal] Ondokuz Mayis Univ, Fac Engn, TR-55139 Samsun, Turkiyeen_US
dc.description.abstractNeural machine translation is one of the most significant research area with the widespread use of deep learning. However, unlike other problems, machine translation includes at least two languages. Due to this situation, the amount of data between the languages to be translated is an important factor for translation success. On the other hand, low-resource languages have problems with the amount of data, which poses a significant challenge to the success of machine translation. Transformer models have achieved great success by modeling long-term dependencies with the self-attention mechanism. However, the feed forward layers (FFN) that follow each self-attention layer constitute almost all the non-embedding parameters of the model. On the other hand, studies in the literature have been conducted on the necessity of these FFN layers in the Transformer model and on different alternatives that can be used. Kolmogorov-Arnold networks (KAN) have recently come to the forefront as a new neural network architecture that has achieved success on many problems. The KAN structure can better learn patterns in complex data using learnable activation functions instead of fixed ones. Accordingly, this study proposes using KAN layers instead of FFN layers in the Transformer model for the low-resource translation problem. It is aimed to overcome the low-resource problem and to present a new alternative within the Transformer model employing the adaptive activation functions of KANs. In traditional Transformer models, FFN layers consist of two linear transformations and ReLU activation functions. In the proposed structure, firstly, the KAN structure is used instead of FFN layers in the Transformer model without any changes in the model dimensions. Then, experiments are conducted with lower-dimensional KAN layers and various parameter sets. The study is carried out using Turkish-English and Kazakh-English language pairs. Obtained findings reveal that using KAN layers instead of FFN layers in the Transformer model has a positive effect on the translation success and that KAN layers used in similar or lower dimensions significantly increase the success of the Transformer model.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1109/ACCESS.2025.3601069
dc.identifier.endpage147053en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-105013742607
dc.identifier.scopusqualityQ1
dc.identifier.startpage147034en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3601069
dc.identifier.urihttps://hdl.handle.net/20.500.12712/38197
dc.identifier.volume13en_US
dc.identifier.wosWOS:001560244000011
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherIEEE-Inst Electrical Electronics Engineers Incen_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectNeural Machine Translationen_US
dc.subjectLow-Resource Languagesen_US
dc.subjectLow-Resource Languagesen_US
dc.subjectTransformeren_US
dc.subjectTransformeren_US
dc.subjectKolmogorov-Arnold Networken_US
dc.subjectKolmogorov-Arnold Networken_US
dc.subjectKolmogorov-Arnold Networken_US
dc.titleUsing Kolmogorov-Arnold Networks in Transformer Model: A Study on Low-Resource Neural Machine Translationen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files