Skip to content

Ochism/InfoSecProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Classifier Outlook Add-in

This software application is a spam email classifier that protects its users from potentially harmful phishing or spam emails. It is implemented as a Microsoft Outlook Add-in that gets called whenever the user receives a new mail. The Add-in will classify the mail as either SPAM or NOT SPAM, prepend its classification and confidence to the email's subject, then move that email into its appropriate folder (Inbox or WatsonSpam).

Requirements

Getting Started

Open up the SpamClassifier.sln Visual Studio solution file on a Windows computer. This will open Visual Studio and build the project. All Add-in code written by the team is located in SpamClassifier/ThisAddIn.cs.

NOTE: The solution file will not automatically install the IBM Watson Natural Language Classifier package. This can be done with the following command issued in the NuGet console:


PM > Install-Package IBM.WatsonDeveloperCloud.NaturalLanguageClassifier.v1

NOTE: There is a known issue with the version of the System.Net.Http library the Natural Language Classifier package has as a dependency. Please update it to the newest stable version through Nuget.

Components

Classifiers

Two classifiers were trained using IBM Watson's Natural Language Classifier service. The classifiers were trained using an online corpus of 4327 emails that were split into 80% training data and 20% testing data. One classifier was responsible for classifying the subjects of emails and the other was used for the email bodies.

Subject Classifier

  • 92.96% accuracy
  • 97.79% average confidence

Body Classifier

  • 94.77% accuracy
  • 95.55% average confidence

The creation, training and testing of these classifiers was done by Kurtis Kuszmaul. Code for these processes can be found in the ClassifierCreation directory.

Outlook Add-in

The Outlook Add-in runs in the background of Outlook and fires whenever new mail is received. It locates the new mail item, extracts the subject and body from it, then classifies those two text fields using the classifiers explained above. The confidence of the classifications is weighted and compared to determine a final classification and confidence level. This classification and confidence percentage is prepended to the subject, then the appropriate action is taken on the email.

Subject Class = Body Class

  • Classification done based on weighted sum of subject and body classifier confidence
  • Requires 85% confidence to keep classification

Subject Class != Body Class

  • Classification of the higher of the two weighted confidences taken
  • Requires 95% confidence to keep classification

The design and development of this Outlook Add-in was done by Gregory Ochs, Ethan Knez, and Kurtis Kuszmaul. Code for these the add-in can be found in the SpamClassifier directory.

External Components (Not Developed by Team)

Contributors

  • Ethan Knez
  • Kurtis Kuszmaul
  • Gregory Ochs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published