Publication: Yapay Zeka Yaklaşımları Kullanılarak İnsan Yüzlerinin Tespiti ile Cinsiyet, Yaş ve Duygu Durumlarının Tahmini
Abstract
İnsan yüzü görselleri üzerinden bireylere ait farklı bilgilerin çıkarılması, yapay zekâ teknolojilerindeki hızlı gelişmelerle birlikte giderek önem kazanmaktadır. Özellikle, tek bir yüz görüntüsünden birden fazla özelliğin eşzamanlı olarak tahmin edilmesini sağlayan çoklu analiz sistemleri, insan-bilgisayar etkileşimi, güvenlik ve kullanıcı profilleme gibi alanlarda yaygın biçimde kullanılmaktadır. Bu çalışmada, yüz görüntülerinden duygu durumu, cinsiyet ve yaş gibi bilgilerin tahminine yönelik kapsamlı bir çoklu tahmin sistemi geliştirilmiştir. Sistemin ilk aşamasında, yüz tespiti için OpenCV kütüphanesinde yer alan Haar Cascade yöntemi kullanılmıştır. Duygu durumu tahmininde EfficientNetV2 tabanlı bir model geliştirilmiştir. Model, yüz ifadelerindeki ince ayrıntıları daha iyi yakalamak için iki farklı dikkat mekanizmasıyla desteklenmiştir. Ekstra eğitim verileri ile genişletilen FER-2013 veri seti üzerinde eğitilen model ile %82.56 doğruluk değeri elde etmiştir. Cinsiyet ve yaş tahmini için ise IMDb-WIKI veri setinin WIKI alt kümesinin ön işleme tabi tutularak bir bölümü kullanılmıştır. Cinsiyet tahmini görevinde, dikkat mekanizması ile desteklenen ResNet-50 mimarisi tercih edilmiştir ve model %95.53 doğruluk oranı ile başarılı sonuçlar vermiştir. Yaş tahmininde ise, girişinde uzamsal dönüştürücü ağı (STN) bulunan, dikkat mekanizması ile güçlendirilmiş ConvNeXt V2 tabanlı regresyon modeli kullanılmış ve 4.63 ortalama mutlak hata (MAE) değeri elde edilmiştir. Tüm bu görevler, gerçek zamanlı çalışabilen ve Haar Cascade tabanlı yüz tespiti altyapısıyla entegre bir sistem haline getirilmiştir. Elde edilen sonuçlar, önerilen yöntemin tek bir yüz görüntüsünden çok yönlü ve güvenilir analizler yapma potansiyelini ortaya koymakta ve bu sayede insan-bilgisayar etkileşimi, güvenlik sistemleri ile kullanıcı profilleme gibi birçok uygulama alanında kullanılabileceğini göstermektedir.
The extraction of diverse personal information from human facial images has gained increasing significance with the rapid advancements in artificial intelligence technologies. In particular, multi-task analysis systems, which enable the simultaneous prediction of multiple attributes from a single facial image, are now widely employed in fields such as human-computer interaction, security, and user profiling. In this study, a comprehensive multi-task prediction system has been developed to estimate information such as emotional state, gender, and age from facial images. In the initial stage of the system, the Haar Cascade method from the OpenCV library was utilized for face detection. For emotion recognition, an EfficientNetV2-based model was designed. To better capture subtle details in facial expressions, the model was enhanced with two different attention mechanisms. Trained on an extended version of the FER-2013 dataset supplemented with additional training samples, the model achieved an accuracy of 82.56%. For gender and age estimation, a preprocessed subset of the IMDb-WIKI dataset's WIKI partition was employed. In the gender prediction task, a ResNet-50 architecture augmented with an attention mechanism was adopted, yielding successful results with an accuracy of 95.53%. For age prediction, a ConvNeXt V2 based regression model, reinforced with an attention mechanism and equipped with a Spatial Transformer Network (STN) at its input, was utilized, achieving a mean absolute error (MAE) of 4.63. All these tasks were integrated into a real-time operational system with a Haar Cascade-based face detection framework. The results demonstrate the potential of the proposed method to perform versatile and reliable analyses from a single facial image, indicating its applicability in a wide range of domains such as human-computer interaction, security systems, and user profiling.
The extraction of diverse personal information from human facial images has gained increasing significance with the rapid advancements in artificial intelligence technologies. In particular, multi-task analysis systems, which enable the simultaneous prediction of multiple attributes from a single facial image, are now widely employed in fields such as human-computer interaction, security, and user profiling. In this study, a comprehensive multi-task prediction system has been developed to estimate information such as emotional state, gender, and age from facial images. In the initial stage of the system, the Haar Cascade method from the OpenCV library was utilized for face detection. For emotion recognition, an EfficientNetV2-based model was designed. To better capture subtle details in facial expressions, the model was enhanced with two different attention mechanisms. Trained on an extended version of the FER-2013 dataset supplemented with additional training samples, the model achieved an accuracy of 82.56%. For gender and age estimation, a preprocessed subset of the IMDb-WIKI dataset's WIKI partition was employed. In the gender prediction task, a ResNet-50 architecture augmented with an attention mechanism was adopted, yielding successful results with an accuracy of 95.53%. For age prediction, a ConvNeXt V2 based regression model, reinforced with an attention mechanism and equipped with a Spatial Transformer Network (STN) at its input, was utilized, achieving a mean absolute error (MAE) of 4.63. All these tasks were integrated into a real-time operational system with a Haar Cascade-based face detection framework. The results demonstrate the potential of the proposed method to perform versatile and reliable analyses from a single facial image, indicating its applicability in a wide range of domains such as human-computer interaction, security systems, and user profiling.
Description
Keywords
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Derin Öğrenme, Görüntü İşleme, Görüntü İşleme Algoritmaları, Makine Öğrenmesi, Yapay Zeka, Yüz Görüntüsü, Computer Engineering and Computer Science and Control, Deep Learning, Image Processing, Image Processing Algorithms, Machine Learning, Artificial Intelligence, Face Image
Citation
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
83
