Forseti is a prototype for binary classification (malware/goodware) targeting ELF/Linux binaries.
Forseti was authored by Lucas Galante under supervision of Marcus Botacin, André Grégio and Paulo de Geus.
Forseti is inspired by the lack of didactic, academic tools for exploring Linux binaries.
The repository is organized as follows:
- code: Contains Forseti's Python scripts.
- data: Contains Forseti's configuration files, databases and configuration files' samples.
- paper: Contains a copy of our white-paper.
- tests: Contains Forseti's test-case files.
Forseti is supported by a series of developments and described in multiple papers:
- Forseti feature extraction mechanisms are described in the course Introdução à Engenharia Reversa de Aplicações Maliciosas em Ambientes Linux, published in the XIX SBSEG. Check Here
- Forseti feature extraction capabilities were used to describe the landscape of Linux malware presented in the paper Malicious Linux Binaries: A Landscape, published in the XVIII SBSEG Check Here
- Forseti classification capabilities are described in the paper Forseti: Extração de características e classificação de binários ELF, published in the XIX SBSEG Check Here
- Forseti evaluation was presented in the paper Machine Learning for Malware Detection: Beyond Accuracy Rates, published in the XIX SBSEG Check Here
Install the following dependencies to run Forseti:
pip install pyelftools
pip install pickle
pip install configparser
pip install sklearn
Forseti can be trained by providing it with a list of goodware and malware files:
python main.py -g goodware.txt -m malware.txt
The list should look like:
$> cat data/malware.txt
> tests/static_malicious.bin
> tests/upx.bin
> tests/fork.bin
Alternatively, one can also provide a list of suspicious files to be classified:
python main.py -g goodware.txt -m malware.txt -s suspicious.txt
Or, automate everything using our script:
./run-forsite.sh
If you start Forseti using the previously presented parameters, you should see a screen like this:
Notice that Forseti: (i) initially displays the feature vectors for all considered binaries; (ii) display all parameters used for the selected classifier; and (iii) finally starts displaying the classification metrics for each folding step.
If you want to take a look on how Forseti extracts features, you might want to look:
- static.py: Static feature extraction.
- dynamic.py: Dynamic feature extraction.
If you want to take a look on Forseti training, you might want to look:
- kfold.py: Folding training implementation.
- ml.py: Classifiers implementation.
More specifically, you might want to change the implemented classifiers by implementing a new class that inherits the MachineLearning class, as Forseti does to implement its classifiers. The currently implemented classifiers are:
class RandomForest(MachineLearing)
class Svm(MachineLearing)
class MLP(MachineLearing)