Home

Fake News Detection Model

Status: WIP

Description

Replicating Fake News Detection ML model of this paper, published in 2017.

Develop a machine learning program to identify when a news source may be producing fake news. We aim to use a corpus of labeled real and fake news articles to build a classifier that can make decisions about information based on the content from the corpus.

Work

Text Processing

Using the NLTK python library.

Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library
Tokenize words
Lemmatization
Visualize data with Word Cloud

Get Tokens for Fake News and True News.

Titles: tokens that had a frequency more than 10 over the entire title dataset
Body: the tokens that had a frequency of more than 200 over the entire dataset
(we only kept tokens with string size greater than 3)

Supervised Learning Algorithms

1. AVERAGE-HYPOTHESIS MODEL

Our average hypothesis model combines the hypotheses obtained from Nave Bayes, Logistic Regression, and SVM by averaging the output probabilities obtained from each model.

2. NEURAL NETWORK

A one-layered neural network model was used on the 80 tokens identified to be most causal to a source classification.

Lemmatization : Lemmatization is the process of reducing inflected forms of a word while still ensuring that the reduced form belongs to the language. This reduced form or root word is called a lemma. For example, organizes, organized and organizing are all forms of organize. Here, organize is the lemma. Lemmatization is necessary because it helps you reduce the inflected forms of a word so that they can be analyzed as a single item. It can also help you normalize the text.

Dataset

Fake News Dataset
Real News Dataset

Dataset features:  
  - title
  - content
  - publication
  - label

Web-browser extension

Web-browser extension that marks articles as fake/true.

Serialise ML model with Pickle
Make API with Flask
Make extension's UI + use API
Zip and upload it to Mozilla add-ons

Provide feedback

Saved searches

Use saved searches to filter your results more quickly