Publication: Makine Öğreniminde Sınıflandırma Yöntemleri Kullanılarak Ulaşım Kartı Suistimalinin Tespit Edilmesi
Abstract
Makine öğrenimi günümüzde hem bilim dünyasında hem de iş dünyasında birçok problemin çözülmesine somut ve hızlı çözümler üretmekte olup, yeni neden sonuç ilişkilerinin ortaya çıkmasına katkı sağlamaktadır. Bu nedenle son yıllarda makine öğreniminin kullanımı giderek yaygınlaşmış ve popüler bir bilim haline gelmiştir. Günümüzde ise ödeme sistemlerinde yaşanan birtakım gelişmeler yeni problemlerin ortaya çıkmasına neden olmuştur ve olmaya devam etmektedir. Bahsi geçen problemlerin başında akıllı kartlarda meydana gelen suistimaller yer almaktadır. Bu çalışmada da akıllı kart olarak nitelendirilen ulaşım kartlarında yapılan suistimaller incelenerek literatürde en sık kullanılan sınıflandırma algoritmaları kullanılmış ve bu algoritmaların sınıflandırma performansları karşılaştırılarak en iyi algoritmaya göre değişkenlerin önem seviyeleri değerlendirilmiştir. Kullanılan algoritmalar ise, Karar Ağaçları, Rastgele Ormanlar, Destek Vektör Makineleri, Lojistik Regresyon, Naif Bayes, Adaboost, XGBoost, K-En Yakın Komşu ve Yapay Sinir Ağlarından Derin Öğrenme algoritmalarıdır. Bu algoritmaların sınıflandırma performanslarının ölçümünde doğruluk, MCC, F1 ölçütü ve AUC ölçütleri kullanılmıştır. Yine, bu çalışmada öncelikli olarak tüm değişkenlerin veri seti üzerindeki dağılımları ve nümerik değişkenlerin normal dağılım durumları incelenmiş, normal dağılım göstermeyen nümerik değişkenlere logaritmik dönüşüm uygulanmıştır. Çalışmada değişken seçme yöntemlerinden Boruta yöntemi tercih edilmiş ve bu yönteme göre bütün değişkenler anlamlı bulunmuştur. Bütün değişkenler modele dahil edilmeden önce bu değişkenler üzerinde 10 katlı çapraz geçerlilik uygulanmış ve veri setinde yer alan tüm değişkenler çalışmaya konu olan tüm makine öğrenimi algoritmalarına dahil edilmiştir. Tüm bu hususlar neticesinde XGBoost algoritmasının diğer algoritmalara göre daha yüksek doğruluk derecesine sahip olduğu gözlenmiştir. XGBoost algoritmasının doğruluk derecesi 0,881, MCC değeri 0,750, AUC değeri 0,953 ve F1 ölçütü de 0,875 olarak ölçülmüştür. Bu sonuca göre XGBoost algoritmasının doğruluk derecesinin en yüksek olması ve orta düzeyde model başarısına sahip olması nedeniyle ulaşım kartı suistimalinin tespit edilmesinde en başarılı sınıflandırıcı olduğu gözlenmiştir.
Today, machine learning produces concrete and fast solutions to solving many problems both in the scientific world and in the business world, and contributes to the emergence of new cause-effect relationships. Therefore, in recent years, the use of machine learning has become increasingly widespread and has become a popular science. Today, some developments in payment systems have led to the emergence of new problems and continue to do so. At the beginning of the aforementioned problems is the abuse of smart cards. In this study, the most frequently used classification algorithms in the literature were used by examining the abuses made in the transportation cards, which are described as smart cards, and the classification performances of these algorithms were compared and the importance levels of the variables were evaluated according to the best algorithm. The algorithms used are Decision Trees, Random Forests, Support Vector Machines, Logistic Regression, Naive Bayes, Adaboost, XGBoost, K-Nearest Neighbor and Deep Learning algorithms from Artificial Neural Networks. Accuracy, MCC, F1 criteria and AUC criteria were used to measure the classification performance of these algorithms. Again, in this study, first of all, the distribution of all variables on the data set and the normal distribution of the numerical variables were examined, and the logarithmic transformation was applied to the numerical variables that did not show normal distribution. In the study, Boruta method was preferred among the variable selection methods and all variables were found significant according to this method. Before all variables were included in the model, 10-fold cross-validation was applied on these variables and all variables in the data set were included in all machine learning algorithms that were the subject of the study. As a result of all these issues, it has been observed that the XGBoost algorithm has a higher degree of accuracy than other algorithms. The accuracy of the XGBoost algorithm was 0,881, the MCC value was 0,750, the AUC value was 0,953, and the F1 criterion was 0,875. According to this result, it has been observed that XGBoost algorithm is the most successful classifier in detecting transportation card abuse, since it has the highest accuracy and moderate model success.
Today, machine learning produces concrete and fast solutions to solving many problems both in the scientific world and in the business world, and contributes to the emergence of new cause-effect relationships. Therefore, in recent years, the use of machine learning has become increasingly widespread and has become a popular science. Today, some developments in payment systems have led to the emergence of new problems and continue to do so. At the beginning of the aforementioned problems is the abuse of smart cards. In this study, the most frequently used classification algorithms in the literature were used by examining the abuses made in the transportation cards, which are described as smart cards, and the classification performances of these algorithms were compared and the importance levels of the variables were evaluated according to the best algorithm. The algorithms used are Decision Trees, Random Forests, Support Vector Machines, Logistic Regression, Naive Bayes, Adaboost, XGBoost, K-Nearest Neighbor and Deep Learning algorithms from Artificial Neural Networks. Accuracy, MCC, F1 criteria and AUC criteria were used to measure the classification performance of these algorithms. Again, in this study, first of all, the distribution of all variables on the data set and the normal distribution of the numerical variables were examined, and the logarithmic transformation was applied to the numerical variables that did not show normal distribution. In the study, Boruta method was preferred among the variable selection methods and all variables were found significant according to this method. Before all variables were included in the model, 10-fold cross-validation was applied on these variables and all variables in the data set were included in all machine learning algorithms that were the subject of the study. As a result of all these issues, it has been observed that the XGBoost algorithm has a higher degree of accuracy than other algorithms. The accuracy of the XGBoost algorithm was 0,881, the MCC value was 0,750, the AUC value was 0,953, and the F1 criterion was 0,875. According to this result, it has been observed that XGBoost algorithm is the most successful classifier in detecting transportation card abuse, since it has the highest accuracy and moderate model success.
Description
Citation
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
84
