Skip to content
VallariAg edited this page Jul 25, 2020 · 1 revision

Fake News Detection Model

Status: WIP

Description

Replicating Fake News Detection ML model of this paper, published in 2017.

Develop a machine learning program to identify when a news source may be producing fake news. We aim to use a corpus of labeled real and fake news articles to build a classifier that can make decisions about information based on the content from the corpus.

Work

Text Processing

Using the NLTK python library.

  1. Tokenize the body and headline with the Punkt statement tokenizer from the NLTK NLP library
  2. Tokenize words
  3. Lemmatization
  4. Visualize data with Word Cloud
  5. Get Tokens for Fake News and True News.
    Titles: tokens that had a frequency more than 10 over the entire title dataset
    Body: the tokens that had a frequency of more than 200 over the entire dataset
    (we only kept tokens with string size greater than 3)
    

Supervised Learning Algorithms

1. AVERAGE-HYPOTHESIS MODEL

Our average hypothesis model combines the hypotheses obtained from Nave Bayes, Logistic Regression, and SVM by averaging the output probabilities obtained from each model.

2. NEURAL NETWORK

A one-layered neural network model was used on the 80 tokens identified to be most causal to a source classification.

Lemmatization : Lemmatization is the process of reducing inflected forms of a word while still ensuring that the reduced form belongs to the language. This reduced form or root word is called a lemma. For example, organizes, organized and organizing are all forms of organize. Here, organize is the lemma. Lemmatization is necessary because it helps you reduce the inflected forms of a word so that they can be analyzed as a single item. It can also help you normalize the text.

Dataset

Dataset features:  
  - title
  - content
  - publication
  - label

Web-browser extension

Web-browser extension that marks articles as fake/true.

  1. Serialise ML model with Pickle
  2. Make API with Flask
  3. Make extension's UI + use API
  4. Zip and upload it to Mozilla add-ons