High-resolution Image-based Malware Classification using Multiple Instance Learning

by Tim Peters, supervised by Hikmat Farhat

Overview

PyTorch Implementation of our paper "High-resolution Image-based Malware Classification using Multiple Instance Learning":

Peters, T. & Farhat, H. (2023). High-resolution Image-based Malware Classification using Multiple Instance Learning. arXiv preprint arXiv:2311.12760. link.

Usage

/attention: Code for the attention-based MIL model

/baseline: Code for the baseline CNN model (non-MIL)

In each:

main.py: Trains the model with the Adam optimizer for 20 epochs and evaluates it on the test set. Also sets up Comet ML logging for metrics.

dataloader.py: Loads the malware samples as images and generates the bags. Parameters: lazy - control pre-loading of samples into memory, test - indicate if loading a test dataset to make small images to log to Comet ML platform, adversarial & adversarial_type - control adversarial enlargement.

model.py: The model implementation.

inference.py: Similar to main.py but only for measuring inference speed. Includes GPU warm-up & GPU sync.

inference_dataloader.py: Similar to dataloader.py but only for inference. No pre-loading into memory.

process_BIG2015_dataset.py: Processes .bytes (hex) files from the Microsoft Malware Classification Challenge (BIG 2015) into .bin (binary) files, removing question marks.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
attention		attention
baseline		baseline
figures		figures
.gitignore		.gitignore
README.md		README.md
process_BIG2015_dataset.py		process_BIG2015_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-resolution Image-based Malware Classification using Multiple Instance Learning

Overview

Usage

About

Releases

Packages

Languages

timppeters/MIL-Malware-Images

Folders and files

Latest commit

History

Repository files navigation

High-resolution Image-based Malware Classification using Multiple Instance Learning

Overview

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages