ILF: AI-based Fuzzer for Ethereum Smart Contracts <a href="https://www.sri.inf.ethz.ch/"><img width="100" alt="portfolio_view" align="right" src="http://safeai.ethz.ch/img/sri-logo.svg"></a>
=============================================================================================================
<p align="center">
    <img width="500" alt="portfolio_view" src="https://www.sri.inf.ethz.ch/assets/images/ilf-logo-1.png">
</p>

ILF is an <ins>**I**</ins>mitation <ins>**L**</ins>earning based <ins>**F**</ins>uzzer for smart contracts. The fuzzing policy, which is used to generate transactions, is represented by an ensemble of neural networks and is learned from thousands of high-quality sequences of transactions generated using symbolic execution. ILF can be used to fuzz any Ethereum smart contract and outputs the coverage and a vulnerability report.

ILF is developed at [SRI Lab, Department of Computer Science, ETH Zurich](https://www.sri.inf.ethz.ch/) as part of the [Machine Learning for Programming](https://www.sri.inf.ethz.ch/research/plml) and [Blockchain Security](https://www.sri.inf.ethz.ch/research/blockchain-security) projects. For mode details, please refer to [ILF CCS'19 paper](https://files.sri.inf.ethz.ch/website/papers/ccs19-ilf.pdf) and [slides](https://files.sri.inf.ethz.ch/website/slides/ccs19-ilf-slides.pdf).

## Setup

We provide a docker file, which we recommend to start with. To build and run:
```
$ docker build -t ilf .
$ docker run -it ilf
```

You can also follow the instructions in the Dockerfile to install ILF locally. If you experience build errors on Apple M chips, please refer to [#21](https://github.com/eth-sri/ilf/issues/21).

## Usage

### Fuzzing

To fuzz the example provided in the repo with ILF (the `imitation` fuzzing policy) using our pre-trained model in the `model` directory:
```
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --fuzzer imitation --model ./model/ --limit 2000
```
The `--fuzzer` argument can be replaced by:
* `random`: a uniformly random fuzzing policy.
* `symbolic`: a symbolic execution fuzzing policy based on depth first search of block states. This is used for generating training sequences.
* `sym_plus`: an augmentation of `symbolic` which can revisit encountered block states.
* `mix`: a fuzzing policy that randomly chooses `imitation` or `symbolic` for generating each transaction.

For fuzzing new contracts, one needs to provide a Truffle project (formatted as the example in `example/crowdsale`). Then the script `script/extract.py` should be called to extract deployment transactions of the contracts. For the example contract, the script runs as follows:
```
$ rm example/crowdsale/transactions.json
$ python3 script/extract.py --proj example/crowdsale/ --port 8545
```
Note that you need to kill existing `ganache-cli` processes listening the same port before calling this script.

### Training

For training, one needs to run `symbolic` on a set of training contracts to produce a dataset in a training directory. Usually tens of thousands of contracts are used for training. For demonstration purposes, we show how to produce a small training dataset from our example contract to the `train_data` directory:
```
$ mkdir train_data
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --limit 2000 --fuzzer symbolic --dataset_dump_path ./train_data/crowdsale.data
```

Run the scripts to select seed integer values and amount values from the training dataset, and put them into `ilf/fuzzers/imitation/int_values.py` and `ilf/fuzzers/imitation/amounts.py`, respectively:
```
$ python3 script/get_int_values.py --train_dir ./train_data
$ python3 script/get_amounts.py --train_dir ./train_data
```

Then the following command performs neural network training and outputs the trained networks in the `new_model` directory:
```
$ mkdir new_model
$ python3 -m ilf --fuzzer imitation --train_dir ./train_data --model ./new_model
```

### Automatically Constructing Truffle Projects

For evaluation and training purposes, one might want to automatically construct Truffle projects from a large set of contracts. To achieve this, one can write a script to automatically produce files required by Truffle projects, following the format in `example/crowdsale`. The compressed file `truffle_scripts.tar.gz` contains the scripts we used. Those scripts might not run directly but can give you a high level idea how things work.

## Citing ILF
```
@inproceedings{He:2019:LFS:3319535.3363230,
 author = {He, Jingxuan and Balunovi\'{c}, Mislav and Ambroladze, Nodar and Tsankov, Petar and Vechev, Martin},
 title = {Learning to Fuzz from Symbolic Execution with Application to Smart Contracts},
 booktitle = {Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security},
 series = {CCS '19},
 year = {2019},
 isbn = {978-1-4503-6747-9},
 location = {London, United Kingdom},
 pages = {531--548},
 numpages = {18},
 url = {http://doi.acm.org/10.1145/3319535.3363230},
 doi = {10.1145/3319535.3363230},
 acmid = {3363230},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {fuzzing, imitation learning, smart contracts, symbolic execution},
} 
```

## Contributors
* [Jingxuan He](https://www.sri.inf.ethz.ch/people/jingxuan)
* [Mislav Balunović](https://www.sri.inf.ethz.ch/people/mislav)
* Nodar Ambroladze
* [Petar Tsankov](https://www.sri.inf.ethz.ch/people/petar)
* [Martin Vechev](https://www.sri.inf.ethz.ch/people/martin)
* Anton Permenev

## License and Copyright
* Copyright (c) 2019 [Secure, Reliable, and Intelligent Systems Lab (SRI), ETH Zurich](https://www.sri.inf.ethz.ch/)
* Licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0)