Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Apache Spark 2.3.0
Jupyter Notebook

Datasets

Datasets used in this project is manually obtained from the following sources:

Phising URLS

Phishtank - https://www.phishtank.com/developer_info.php
Open Phis - https://openphish.com/

SPAM URLS

JWSPAMSPY - http://www.joewein.de/sw/blacklist.htm

Malware URLS

Benign URLS

Majestic - https://majestic.com/reports/majestic-million

Another Usefull Source to collect Malicious URLs

https://zeltser.com/malicious-ip-blocklists/

The Dataset.csv used in this project is the combination of the above sources. A data pre-processing program is used to clean and filter the data. Thus, the dataset is already being labelled and ready to be used in the project.

Name	Name	Last commit message	Last commit date
Latest commit rlilojr Initial Commit Jul 24, 2018 4a871bd · Jul 24, 2018 History 19 Commits
.ipynb_checkpoints	.ipynb_checkpoints	Initial Commit	Jul 24, 2018
metastore_db	metastore_db	Initial Commit	Jul 24, 2018
README.md	README.md	Create README.md	May 27, 2018
dataset.csv	dataset.csv	remodified	May 27, 2018
derby.log	derby.log	Initial Commit	Jul 24, 2018
sample.txt	sample.txt	Initial Commit	Jul 24, 2018
source_code.ipynb	source_code.ipynb	Initial Commit	Jul 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Datasets

Phising URLS

SPAM URLS

Malware URLS

Benign URLS

Another Usefull Source to collect Malicious URLs

About

Releases

Packages

Languages

rlilojr/Detecting-Malicious-URL-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Datasets

Phising URLS

SPAM URLS

Malware URLS

Benign URLS

Another Usefull Source to collect Malicious URLs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages