Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

Şahin, D.O.; Demirci, S.

doi:10.1109/SIU49456.2020.9302516

Publication:
Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

Date

2020

Authors

Şahin, D.O.

Demirci, S.

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

In this study, it is aimed to filter spam e-mails by using machine learning and text mining techniques. K-Nearest Neighbor (KNN) algorithm which is one of the techniques of machine learning is used. KNN algorithm is an easy to use and high performance classification algorithm. But the main problem of this algorithm is what will be the k value at the beginning. The performance of the algorithm changes according to the selected k value. In this study, three different data sets are discussed. These are Enron, Ling-Spam and SMSSpam-Collection data sets. Firstly, basic text mining techniques and term frequency-inverse document frequency (TF-IDF) term weighting method are applied to all data sets. By, according to the Chi-Square feature selection method, the best 500 attributes are selected and given to KNN algorithm. Finally, extensive experiments are carried out by giving the values of 1, 3, 5, 7 and 9 to the k value of the algorithm. In all three data sets, the most successful result is obtained when k is 1. The most successful results obtained from Ling-Spam, Enron and SMSSpam-Collection data sets according to F-measure are 0:9324, 0:9215 and 0:9196 respectively. © 2020 IEEE.

Keywords

Chi-Square Feature Selection, E-Mail Classification, Inverse Document Frequency, Nearest Neighborhood, Spam Filtering, Term Frequency

WoS Q

N/A

Scopus Q

N/A

Source

-- 28th Signal Processing and Communications Applications Conference, SIU 2020 -- 2020-10-05 through 2020-10-07 -- Gaziantep -- 166413

URI

https://doi.org/10.1109/SIU49456.2020.9302516
https://hdl.handle.net/20.500.12712/36256

Collections

Scopus İndeksli Yayınlar Koleksiyonu

Full item page

Publication:
Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Spam Filtering With KNN: Investigation of the Effect of K Value on Classification Performance