Publication: Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance
| dc.authorscopusid | 56589621700 | |
| dc.authorscopusid | 43261041200 | |
| dc.contributor.author | Şahin, D.O. | |
| dc.contributor.author | Demirci, S. | |
| dc.date.accessioned | 2025-12-11T00:22:38Z | |
| dc.date.issued | 2020 | |
| dc.department | Ondokuz Mayıs Üniversitesi | en_US |
| dc.department-temp | [Şahin] Durmuş Ozkan, Ondokuz Mayis Üniversitesi, Samsun, Turkey; [Demirci] Sercan, Ondokuz Mayis Üniversitesi, Samsun, Turkey | en_US |
| dc.description.abstract | In this study, it is aimed to filter spam e-mails by using machine learning and text mining techniques. K-Nearest Neighbor (KNN) algorithm which is one of the techniques of machine learning is used. KNN algorithm is an easy to use and high performance classification algorithm. But the main problem of this algorithm is what will be the k value at the beginning. The performance of the algorithm changes according to the selected k value. In this study, three different data sets are discussed. These are Enron, Ling-Spam and SMSSpam-Collection data sets. Firstly, basic text mining techniques and term frequency-inverse document frequency (TF-IDF) term weighting method are applied to all data sets. By, according to the Chi-Square feature selection method, the best 500 attributes are selected and given to KNN algorithm. Finally, extensive experiments are carried out by giving the values of 1, 3, 5, 7 and 9 to the k value of the algorithm. In all three data sets, the most successful result is obtained when k is 1. The most successful results obtained from Ling-Spam, Enron and SMSSpam-Collection data sets according to F-measure are 0:9324, 0:9215 and 0:9196 respectively. © 2020 IEEE. | en_US |
| dc.identifier.doi | 10.1109/SIU49456.2020.9302516 | |
| dc.identifier.isbn | 9781728172064 | |
| dc.identifier.scopus | 2-s2.0-85100295203 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.uri | https://doi.org/10.1109/SIU49456.2020.9302516 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12712/36256 | |
| dc.identifier.wosquality | N/A | |
| dc.language.iso | tr | en_US |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
| dc.relation.ispartof | -- 28th Signal Processing and Communications Applications Conference, SIU 2020 -- 2020-10-05 through 2020-10-07 -- Gaziantep -- 166413 | en_US |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Chi-Square Feature Selection | en_US |
| dc.subject | E-Mail Classification | en_US |
| dc.subject | Inverse Document Frequency | en_US |
| dc.subject | Nearest Neighborhood | en_US |
| dc.subject | Spam Filtering | en_US |
| dc.subject | Term Frequency | en_US |
| dc.title | Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance | en_US |
| dc.title.alternative | KNN İle İstenmeyen E-Posta Filtreleme: K Değerinin Sınıflandırma Performansına Etkisinin Araştırılması | en_US |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication |
