Publication:
Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations

dc.authorscopusid57274538200
dc.authorscopusid60129175000
dc.authorscopusid22953804000
dc.authorwosidSahin, Durmus/Aaj-7961-2020
dc.authorwosidKiliç, Erdal/Y-2198-2018
dc.contributor.authorGoksu Ozturk, Mirac
dc.contributor.authorOzkan Sahin, Durmus
dc.contributor.authorKilic, Erdal
dc.date.accessioned2025-12-11T00:43:58Z
dc.date.issued2025
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Goksu Ozturk, Mirac] Ondokuz Mayis Univ, Inst Grad Studies, Dept Computat Sci, TR-55200 Samsun, Turkiye; [Ozkan Sahin, Durmus; Kilic, Erdal] Ondokuz Mayis Univ, Fac Engn, Dept Comp Engn, TR-55200 Samsun, Turkiyeen_US
dc.description.abstractThis systematic literature review explores the progress, challenges, and opportunities in the field of Optical Character Recognition (OCR) for the Turkish language. Despite significant advancements, the development of robust Turkish OCR systems faces several obstacles, such as a lack of publicly available datasets, limited open-source solutions, and the underutilization of cutting-edge deep learning techniques. These challenges hinder the creation of OCR systems that can match the capabilities of those developed for languages like English. Focusing on 38 peer-reviewed studies published between 2019 and 2023, this paper provides the first systematic review of Turkish OCR research, offering a comprehensive analysis of the current methods, datasets, and evaluation metrics across both modern Turkish (Latin script) and Ottoman Turkish (Arabic script) contexts. Our findings highlight that while Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Convolutional Recurrent Neural Networks (CRNN) architectures are frequently used, Transformer-based and end-to-end models remain underexplored in Turkish OCR. We also identify data scarcity and the lack of reproducible benchmark datasets as key barriers. By analyzing current research trends, pinpointing challenges, and emphasizing opportunities for future advancements, this review aims to be a valuable resource for researchers and Turkish language recognition. Our study contributes to the field by offering a structured overview of existing methods and proposes practical recommendations for improving dataset availability, encouraging open-source collaboration, and adopting more advanced model architectures.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1109/ACCESS.2025.3614147
dc.identifier.endpage168997en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-105017393015
dc.identifier.scopusqualityQ1
dc.identifier.startpage168977en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3614147
dc.identifier.urihttps://hdl.handle.net/20.500.12712/38840
dc.identifier.volume13en_US
dc.identifier.wosWOS:001586205100041
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherIEEE-Inst Electrical Electronics Engineers Incen_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectOptical Character Recognitionen_US
dc.subjectText Recognitionen_US
dc.subjectSystematic Literature Reviewen_US
dc.subjectSurveysen_US
dc.subjectConvolutional Neural Networksen_US
dc.subjectAccuracyen_US
dc.subjectSystematicsen_US
dc.subjectMeasurementen_US
dc.subjectLinguisticsen_US
dc.subjectFocusingen_US
dc.subjectClassificationen_US
dc.subjectDeep Learningen_US
dc.subjectMachine Learningen_US
dc.subjectOptical Character Recognitionen_US
dc.subjectOCR Applicationsen_US
dc.subjectTurkish OCRen_US
dc.titleTurkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitationsen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files