Skip to content

SayHiRay/malware-detection

Repository files navigation

This repository is the Master course project for CS5242 at NUS. The project is a in-class Kaggle competition, and the detail of the competition can be found here on Kaggle.

The training and test procedure are as the following:

  1. Run train.py, three Keras models are trained on different train/validation split. Each model is trained for 50 epochs. The accuracy and AUC score are reported after each epoch. Based on these reported metrics, we decided the 3 best models to use, each trained on a different train/validation split.
  2. Run test.py. Three models selected from the last step are loaded. We then make predictions on the test set using the 3 models, and obtain 3 copies of predictions. Then we obtain the final result by taking mean of the 3 predictions.

Note that for our best submission of this project, the model files of the 3 models are also included with our code. They will be loaded in test.py by default, so that our result can be easily replicated. Due to storage limit, data files are not included in the repo, but can be found on Kaggle page of the competition.

Though its simplicity, our final result ranks 7th out of the 68 teams. There is still space for improvement, such as better hyperparameter tuning, using other architectures including ResNet, and better ensemble techniques. For more information on this project, please refer to Report.pdf.

About

Malware detection using CNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages