Publication:
Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Research Projects

Organizational Units

Journal Issue

Abstract

In this study, it is aimed to filter spam e-mails by using machine learning and text mining techniques. K-Nearest Neighbor (KNN) algorithm which is one of the techniques of machine learning is used. KNN algorithm is an easy to use and high performance classification algorithm. But the main problem of this algorithm is what will be the k value at the beginning. The performance of the algorithm changes according to the selected k value. In this study, three different data sets are discussed. These are Enron, Ling-Spam and SMSSpam-Collection data sets. Firstly, basic text mining techniques and term frequency-inverse document frequency (TF-IDF) term weighting method are applied to all data sets. By, according to the Chi-Square feature selection method, the best 500 attributes are selected and given to KNN algorithm. Finally, extensive experiments are carried out by giving the values of 1, 3, 5, 7 and 9 to the k value of the algorithm. In all three data sets, the most successful result is obtained when k is 1. The most successful results obtained from Ling-Spam, Enron and SMSSpam-Collection data sets according to F-measure are 0:9324, 0:9215 and 0:9196 respectively. © 2020 IEEE.

Description

Citation

WoS Q

N/A

Scopus Q

N/A

Source

-- 28th Signal Processing and Communications Applications Conference, SIU 2020 -- 2020-10-05 through 2020-10-07 -- Gaziantep -- 166413

Volume

Issue

Start Page

End Page

Endorsement

Review

Supplemented By

Referenced By