by Tim Peters, supervised by Hikmat Farhat
PyTorch Implementation of our paper "High-resolution Image-based Malware Classification using Multiple Instance Learning":
- Peters, T. & Farhat, H. (2023). High-resolution Image-based Malware Classification using Multiple Instance Learning. arXiv preprint arXiv:2311.12760. link.
/attention
: Code for the attention-based MIL model
/baseline
: Code for the baseline CNN model (non-MIL)
In each:
main.py
: Trains the model with the Adam optimizer for 20 epochs and evaluates it on the test set. Also sets up Comet ML logging for metrics.
dataloader.py
: Loads the malware samples as images and generates the bags. Parameters: lazy
- control pre-loading of samples into memory, test
- indicate if loading a test dataset to make small images to log to Comet ML platform, adversarial
& adversarial_type
- control adversarial enlargement.
model.py
: The model implementation.
inference.py
: Similar to main.py
but only for measuring inference speed. Includes GPU warm-up & GPU sync.
inference_dataloader.py
: Similar to dataloader.py
but only for inference. No pre-loading into memory.
process_BIG2015_dataset.py
: Processes .bytes (hex) files from the Microsoft Malware Classification Challenge (BIG 2015) into .bin (binary) files, removing question marks.