Telsoc
Published on Telsoc (https://telsoc.org)

Home > Phishing Message Detection Based on Keyword Matching

Phishing Message Detection Based on Keyword Matching

Keng-Theen Tham [1]

Multimedia University

Kok-Why Ng [2]

Multimedia University

Su-Cheng Haw [3]

Multimedia University, Malaysia


JTDE - Vol 11, No 3 - September 2023 [4]

[5]
48 [6]

Abstract

This paper proposes to use the Naïve Bayes-based algorithm for phishing detection, specifically in spam emails. The paper compares probability-based and frequency-based approaches and investigates the impact of imbalanced datasets and the use of stemming as a natural language processing (NLP) technique. Results show that both algorithms perform similarly in spam detection, with the choice between them depending on factors such as efficiency and scalability. Accuracy is influenced by the dataset configuration and stemming. Imbalanced datasets lead to higher accuracy in detecting emails in the majority class, while they struggle to classify minority-class emails. In contrast, balanced datasets yield overall high accuracy for both spam and ham email identification. This study reveals that stemming has a minor impact on algorithm performance, occasionally decreasing in accuracy due to word grouping. Balancing the dataset is crucial for improving algorithm performance and achieving accurate spam email detection. Hence, both probability-based and frequency-based Naïve Bayes algorithms are effective for phishing detection using balanced datasets. The frequency-based approach, with a balanced dataset and stemming, achieves a balanced performance between recall and precision, while the probability-based method with a balanced dataset and no stemming prioritises overall accuracy.
Article PDF: 
PDF icon 776-tham-article-v11n3pp105-119.pdf [7]

Copyright notice:

Copyright is held by the Authors subject to the Journal Copyright notice. [8]

Cite this article as:

Keng-Theen Tham, Kok-Why Ng, Su-Cheng Haw. 2023. Phishing Message Detection Based on Keyword Matching. JTDE, Vol 11, No 3, Article 776. http://doi.org/10.18080/JTDE.v11n3.776 [9]. Published by Telecommunications Association Inc. ABN 34 732 327 053. https://telsoc.org [10]



Source URL:https://telsoc.org/journal/jtde-v11-n3/a776

Links
[1] https://telsoc.org/journal/author/keng-theen-tham [2] https://telsoc.org/journal/author/kok-why-ng [3] https://telsoc.org/journal/author/su-cheng-haw [4] https://telsoc.org/journal/jtde-v11-n3 [5] https://www.addtoany.com/share#url=https%3A%2F%2Ftelsoc.org%2Fjournal%2Fjtde-v11-n3%2Fa776&title=Phishing%20Message%20Detection%20Based%20on%20Keyword%20Matching [6] https://telsoc.org/print/4136?rate=meM3qVVJurovfC-lbmFK9fPo4Eo9MvNYUqENCdd3vO0 [7] https://telsoc.org/sites/default/files/journal_article/776-tham-article-v11n3pp105-119.pdf [8] https://telsoc.org/copyright [9] http://doi.org/10.18080/jtde.v11n3.776 [10] https://telsoc.org