By RTFD team
The phenomenon of phishing has been around for many years. However, the last year has shown how important internet security is among other things. Over a year ago, the world stopped: everybody and everything was moved to the Internet. That motivated us to analyse the topic of Phishing. Phishers usually use email or SMS messages to deceive and force users to act according to their expectations. In our research we tried to implement some machine learning analysis of the URL in order to prevent the deception.
First of all you need to clone or download this repository.
Conda environments allow multiple incompatible versions of the same package to coexist on your system. An environment is simply a file path containing a collection of mutually compatible packages.
There are a large number of reasons why it is best practice to use environments, but two of them we believe to be the most important in this project are:
- an ability to both install and uninstall all the necessary libraries with one command;
- a certainty that all packages are going to be installed correctly with least effort.
You can download Miniconda from Anaconda's official documentation. If you prefer installing all the necessary packages manually than please jump to the following section: Using pip
.
Assuming this Git repository is already cloned to your local machine and Miniconda is installed the next task you would probably like to accomplish is to create a separate Conda-environment specially for this project. To do so please use the following commands inside the project's directory:
conda env create --file env.yml
You may like to change the default environment name Hackathon2021
or path to environment location using --name ENVIRONMENT
and --prefix PATH
flags respectively.
If everything goes according to plan a question similar to the one shown below should appear. When conda asks you to proceed, type y
:
The following NEW packages will be INSTALLED:
anyio conda-forge/win-64::anyio-3.1.0-py39hcbf5309_0
argon2-cffi conda-forge/win-64::argon2-cffi-20.1.0-py39hb82d6ee_2
async_generator conda-forge/noarch::async_generator-1.10-py_0
attrs conda-forge/noarch::attrs-21.2.0-pyhd8ed1ab_0
babel conda-forge/noarch::babel-2.9.1-pyh44b312d_0
...
Proceed ([y]/n)?
If the environment has been installed correctly than the similar message should appear in your console:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate Hackathon2021
#
# To deactivate an active environment, use
#
# $ conda deactivate
Note: the environment name (in our case it is "Hackathon2021") may differ if you changed the --name
flag in
the previous command.
To activate your environment:
conda activate Hackathon2021
If everything seems to work as expected than you can jump to the following section: Opening JupyterLab Notebook Interface.
If you anyway prefer installing modules using pip
than the following command has to be used:
pip install pandas, numpy, matplotlib, flask, sklearn, urllib3, nltk, seaborn, Pillow, joblib, jupyterlab
To open JupyterLab
Notebook Interface use one of the following commands:
jupyter-lab
or
python -m jupyterlab
If there were no problems with installing or running the stuff, you are ready to go. You can start from opening the file Notebooks/Second_model.ipynb
, where we have covered the basics of our analysis.
- CERT Polska : Lista ostrzeżeń przed niebezpiecznymi stronami
- Wykaz stron internetowych podmiotów publicznych
- Alexa Top 1 Million Sites
- URL categorization
- DMOZ URL Classification Dataset
- PhishTank
- Artists Against 419
- URL dataset (ISCX-URL2016)
- Kaggle Labeled Url Dataset
- PhishStorm - phishing / legitimate URL dataset
- Dataset of Malicious and Benign Webpages
- OpenSRS Domain Pricing
- OpenFish
- The Moz Top 500 Websites
- Malicious URL Filtering – A Big Data Application
- Phishing detection based Associative Classification data mining
- PhishScore: Hacking Phishers' Minds
- Detecting Malicious URLs Using Lexical Analysis
- PhishStorm: Detecting Phishing With Streaming Analytics
- Phishing Landscape 2020
- Poster design was inspired by r-lab-project
- Free SVG
- Pixel perfect, Freepik, Becris, bqlqn from Flaticon