Publication:
Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

dc.authorscopusid56589621700
dc.authorscopusid43261041200
dc.contributor.authorŞahin, D.O.
dc.contributor.authorDemirci, S.
dc.date.accessioned2025-12-11T00:22:38Z
dc.date.issued2020
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Şahin] Durmuş Ozkan, Ondokuz Mayis Üniversitesi, Samsun, Turkey; [Demirci] Sercan, Ondokuz Mayis Üniversitesi, Samsun, Turkeyen_US
dc.description.abstractIn this study, it is aimed to filter spam e-mails by using machine learning and text mining techniques. K-Nearest Neighbor (KNN) algorithm which is one of the techniques of machine learning is used. KNN algorithm is an easy to use and high performance classification algorithm. But the main problem of this algorithm is what will be the k value at the beginning. The performance of the algorithm changes according to the selected k value. In this study, three different data sets are discussed. These are Enron, Ling-Spam and SMSSpam-Collection data sets. Firstly, basic text mining techniques and term frequency-inverse document frequency (TF-IDF) term weighting method are applied to all data sets. By, according to the Chi-Square feature selection method, the best 500 attributes are selected and given to KNN algorithm. Finally, extensive experiments are carried out by giving the values of 1, 3, 5, 7 and 9 to the k value of the algorithm. In all three data sets, the most successful result is obtained when k is 1. The most successful results obtained from Ling-Spam, Enron and SMSSpam-Collection data sets according to F-measure are 0:9324, 0:9215 and 0:9196 respectively. © 2020 IEEE.en_US
dc.identifier.doi10.1109/SIU49456.2020.9302516
dc.identifier.isbn9781728172064
dc.identifier.scopus2-s2.0-85100295203
dc.identifier.scopusqualityN/A
dc.identifier.urihttps://doi.org/10.1109/SIU49456.2020.9302516
dc.identifier.urihttps://hdl.handle.net/20.500.12712/36256
dc.identifier.wosqualityN/A
dc.language.isotren_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartof-- 28th Signal Processing and Communications Applications Conference, SIU 2020 -- 2020-10-05 through 2020-10-07 -- Gaziantep -- 166413en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectChi-Square Feature Selectionen_US
dc.subjectE-Mail Classificationen_US
dc.subjectInverse Document Frequencyen_US
dc.subjectNearest Neighborhooden_US
dc.subjectSpam Filteringen_US
dc.subjectTerm Frequencyen_US
dc.titleSpam Filtering With KNN: Investigation of the Effect of K Value on Classification Performanceen_US
dc.title.alternativeKNN İle İstenmeyen E-Posta Filtreleme: K Değerinin Sınıflandırma Performansına Etkisinin Araştırılmasıen_US
dc.typeConference Objecten_US
dspace.entity.typePublication

Files