Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations

Goksu Ozturk, Mirac; Ozkan Sahin, Durmus; Kilic, Erdal

doi:10.1109/ACCESS.2025.3614147

Publication:
Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations

dc.authorscopusid	57274538200
dc.authorscopusid	60129175000
dc.authorscopusid	22953804000
dc.authorwosid	Sahin, Durmus/Aaj-7961-2020
dc.authorwosid	Kiliç, Erdal/Y-2198-2018
dc.contributor.author	Goksu Ozturk, Mirac
dc.contributor.author	Ozkan Sahin, Durmus
dc.contributor.author	Kilic, Erdal
dc.date.accessioned	2025-12-11T00:43:58Z
dc.date.issued	2025
dc.department	Ondokuz Mayıs Üniversitesi	en_US
dc.department-temp	[Goksu Ozturk, Mirac] Ondokuz Mayis Univ, Inst Grad Studies, Dept Computat Sci, TR-55200 Samsun, Turkiye; [Ozkan Sahin, Durmus; Kilic, Erdal] Ondokuz Mayis Univ, Fac Engn, Dept Comp Engn, TR-55200 Samsun, Turkiye	en_US
dc.description.abstract	This systematic literature review explores the progress, challenges, and opportunities in the field of Optical Character Recognition (OCR) for the Turkish language. Despite significant advancements, the development of robust Turkish OCR systems faces several obstacles, such as a lack of publicly available datasets, limited open-source solutions, and the underutilization of cutting-edge deep learning techniques. These challenges hinder the creation of OCR systems that can match the capabilities of those developed for languages like English. Focusing on 38 peer-reviewed studies published between 2019 and 2023, this paper provides the first systematic review of Turkish OCR research, offering a comprehensive analysis of the current methods, datasets, and evaluation metrics across both modern Turkish (Latin script) and Ottoman Turkish (Arabic script) contexts. Our findings highlight that while Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Convolutional Recurrent Neural Networks (CRNN) architectures are frequently used, Transformer-based and end-to-end models remain underexplored in Turkish OCR. We also identify data scarcity and the lack of reproducible benchmark datasets as key barriers. By analyzing current research trends, pinpointing challenges, and emphasizing opportunities for future advancements, this review aims to be a valuable resource for researchers and Turkish language recognition. Our study contributes to the field by offering a structured overview of existing methods and proposes practical recommendations for improving dataset availability, encouraging open-source collaboration, and adopting more advanced model architectures.	en_US
dc.description.woscitationindex	Science Citation Index Expanded
dc.identifier.doi	10.1109/ACCESS.2025.3614147
dc.identifier.endpage	168997	en_US
dc.identifier.issn	2169-3536
dc.identifier.scopus	2-s2.0-105017393015
dc.identifier.scopusquality	Q1
dc.identifier.startpage	168977	en_US
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2025.3614147
dc.identifier.uri	https://hdl.handle.net/20.500.12712/38840
dc.identifier.volume	13	en_US
dc.identifier.wos	WOS:001586205100041
dc.identifier.wosquality	Q2
dc.language.iso	en	en_US
dc.publisher	IEEE-Inst Electrical Electronics Engineers Inc	en_US
dc.relation.ispartof	IEEE Access	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Optical Character Recognition	en_US
dc.subject	Text Recognition	en_US
dc.subject	Systematic Literature Review	en_US
dc.subject	Surveys	en_US
dc.subject	Convolutional Neural Networks	en_US
dc.subject	Accuracy	en_US
dc.subject	Systematics	en_US
dc.subject	Measurement	en_US
dc.subject	Linguistics	en_US
dc.subject	Focusing	en_US
dc.subject	Classification	en_US
dc.subject	Deep Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	Optical Character Recognition	en_US
dc.subject	OCR Applications	en_US
dc.subject	Turkish OCR	en_US
dc.title	Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations	en_US
dc.type	Article	en_US
dspace.entity.type	Publication

Collections

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Publication: Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations

Files

Collections

Publication:
Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations