Ham vs Spam: How to Identify and Classify Spam E-mail | HackerNoon
Briefly

In the modern communication landscape, spam emails pose a significant challenge, comprising 45% of daily email traffic. This paper analyzes various machine learning models like K-Nearest Neighbors, Logistic Regression, Support Vector Machines, and Naïve Bayes in their ability to classify emails as spam or ham. Evaluating performance through metrics such as accuracy, precision, recall, and F1 score helps determine the most effective classification model. The study uses a Kaggle dataset with 2551 ham and 501 spam emails, highlighting the necessity of accurate identification to protect user productivity and business resources.
In 2023, there are 347.3 billion emails sent every day, out of which spam emails compose 45% of all email traffic.
This paper investigates the use of various machine learning techniques to classify emails as either spam or ham, focusing on effectiveness and performance metrics.
KNN with K=3 was selected as the baseline model. Logistic Regression, Naïve Bayes, and SVM models were tested to determine the best approach.
The performance of each model is evaluated based on accuracy, precision, recall, and F1 score to ascertain the most suitable spam classification method.
Read at Hackernoon
[
|
]