Publication: Turkish Optical Character Recognition Under the Lens: A Systematic Review of Language-Specific Challenges, Dataset Scarcity, and Open-Source Limitations
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Abstract
This systematic literature review explores the progress, challenges, and opportunities in the field of Optical Character Recognition (OCR) for the Turkish language. Despite significant advancements, the development of robust Turkish OCR systems faces several obstacles, such as a lack of publicly available datasets, limited open-source solutions, and the underutilization of cutting-edge deep learning techniques. These challenges hinder the creation of OCR systems that can match the capabilities of those developed for languages like English. Focusing on 38 peer-reviewed studies published between 2019 and 2023, this paper provides the first systematic review of Turkish OCR research, offering a comprehensive analysis of the current methods, datasets, and evaluation metrics across both modern Turkish (Latin script) and Ottoman Turkish (Arabic script) contexts. Our findings highlight that while Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Convolutional Recurrent Neural Networks (CRNN) architectures are frequently used, Transformer-based and end-to-end models remain underexplored in Turkish OCR. We also identify data scarcity and the lack of reproducible benchmark datasets as key barriers. By analyzing current research trends, pinpointing challenges, and emphasizing opportunities for future advancements, this review aims to be a valuable resource for researchers and Turkish language recognition. Our study contributes to the field by offering a structured overview of existing methods and proposes practical recommendations for improving dataset availability, encouraging open-source collaboration, and adopting more advanced model architectures.
Description
Citation
WoS Q
Q2
Scopus Q
Q1
Source
IEEE Access
Volume
13
Issue
Start Page
168977
End Page
168997
