This repository contains an online learning methodology for dynamic malware detection using process resource utilization metrics.
Dynamic malware detection using process resource utilization metrics, such as CPU usage and memory consumption, often relies on large labeled datasets, thus the challenge lies in adapting to evolving malware patterns. To address this, we design an online learning methodology focused on zero-day malware. Our methodology incorporates temporal information and continuously updates its models, therefore it can effectively learn from emerging threats and detect zero-day malware even in scenarios with limited data availability.
The complete description of our methodology is available at the publication:
Towards Online Malware Detection using Process Resource Utilization Metrics
This repository contains all code and instructions required to reproduce the findings of the above publication. If it is helpful or relevant to your work, you may cite it.
One can run the code using the following steps:
- Install the Python requirements by running
pip install -r requirements.txt - Clone this repository and set the parameters in file
properties.py. The properties include the path to the malware CSVs from the VMs Performance Metrics Dataset of Abdelsalam et al. (available at this URL). Moreover, the code requires an active VirusTotal account as its API key must be used for retrieving malware information. Finally, multiple folders for data outputs are used. - Run the Python scripts one by one. The code is actually a collection of scripts that are used to make data manipulations one after the other (we use the numbering a0, a1, a2, ... to dictate the order of running the scripts).