This is the dataset used in "WAF-A-MoLE: Evading Web Application Firewalls through Adversarial Machine Learning".
If you use this dataset, please cite us:
@article{demetrio2020waf,
title={WAF-A-MoLE: Evading Web Application Firewalls through Adversarial Machine Learning},
author={Demetrio, Luca and Valenza, Andrea and Costa, Gabriele and Lagorio, Giovanni},
journal={Proceedings of the 35th Annual ACM Symposium on Applied Computing},
year={2020}
}
Since GitHub does not allow files larger than 25MB, we divided them in chunks.
In our paper, we used the full attacks.sql
and sane.sql
files.
To rebuild the whole dataset, you can use any command that concatenates files by line.
:~$ cat attacks.sql.* > attacks.sql
:~$ cat sane.sql.* > sane.sql
You can use each chunk by itself, but this feature is untested.
WARNING: each payload might contain \n
, do not split this file by \n
or you will get incomplete queries. The correct way of getting single samples is to use sqlparse
.
>>> import sqlparse
>>> # Split a string containing two SQL statements:
>>> attacks = open('attacks.sql', 'r').read()
>>> statements = sqlparse.split(attacks)
>>> statements
['select * from foo;', 'select * from bar;', ... ]