- Some fake articles have relatively frequent use of terms seemingly intended to inspire outrage and the present writing skill in such articles is generally considerably lesser than in standard news.
- Detecting fake news articles by analyzing patterns in writing of the articles.
- Made using fine tuning BERT
- With an Accuarcy of 80% on the custom dataset.
- All the
code
required to get started.
- Clone this repo to your local machine using
https://github.com/abhilashreddys/Fake-News-Article.git
- Install these libraries/packages.
$ pip3 install pandas numpy scikit-learn bs4
$ pip3 install torch
$ pip3 install keras
$ pip3 install pytorch_pretrained_bert
$ pip3 install transformers
- Data is collected by scraping the websites of popular news publishing sources.
- The collected news articles are judged using the score, quality, bias as metric collected from Politilact and Media Charts.
- Some basic preprocessing is also done on the text collected from scraping websites.
- Used BeautifulSoup for scraping articles from the web, Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping
- Also used some custom made functions for removing punctuation etc.
scraping from websites listed in politifact_data.csv
$ python3 scrape_politifact.py
scraping from websites listed in Interactive Media Bias Chart - Ad Fontes Media.csv
$ python3 scrape_media.py
- Data after scraping and preprocessing politifact_text.csv , pre_media.csv
- Trained by fine tuning the BERT
- Used BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding with fine tuning
- BERT, which stands for Bidirectional Encoder Representations from Transformers.
- BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering andlanguage inference, without substantial taskspecific architecture modifications.
class BertBinaryClassifier(nn.Module):
def __init__(self, dropout=0.1):
super(BertBinaryClassifier, self).__init__()
self.bert = BertModel.from_pretrained('bert-base-uncased')
self.dropout = nn.Dropout(dropout)
self.linear = nn.Linear(768, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, tokens, masks=None):
_, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
dropout_output = self.dropout(pooled_output)
linear_output = self.linear(dropout_output)
proba = self.sigmoid(linear_output)
return proba
- Download here : Link
- Run
inference.py
and mention url of the article you want to test in comand line
$ python3 inference.py url
- Check the file locations properly, change it if required.
- If you face any problems with script files use notebooks transfrom_spam.ipynb for training and fake_article.ipynb for inference.
- Trained only for
5 Epochs
, trying to use a better model with more data.
- For data Politilact and Media Charts
- Keras: The Python Deep Learning library
- A library of state-of-the-art pretrained models for Natural Language Processing
- Pytorch Deep Learning framework
- Pytorch BERT usage example
- Attention Is All You Need
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}
}
@article{devlin2018bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2018}
}