Note: you need to clone this repo using the --recursive flag since this repo has submodules, e.g., git clone git@github.com:ouerum/neural-FEBI.git --recursive
- Build Solidity compiler from the source(https://docs.soliditylang.org/)
- Python 3.8
- Use the project
scripts/contractCrawler
to download the Solidity code from EtherScan. - Compile the instrumented Solidity Compiler (
instrumentedSolc/solc-0.4.25
andinstrumentedSolc/solc-0.5.17
). - Use the
scripts/buildGroundTruth/compileContracts.py
to compile Solidity Contract into EVM bytecode and annotation about how to construct function boundaries. - Use the
scripts/buildGroundTruth/getGroundTruthBatch.py
to extract the ground truth from the above annotation.
- make sure the setting in
FSI/utils/config.py
are set. - Use the
FSI/main.py
to train the model to identify function entries of EVM bytecode. - Once the training is over, use the
FSI/predict_operator.py
to predict the function entries for the stripped EVM bytecode.
- make sure the setting in
FBD/fbdconfig.py
are set. - Use the
FBD/batch_analysis.py
to detect function boundaries.
URL: https://drive.google.com/drive/folders/1LsVEmKFALN9t2trWivPgEZI41lzeqEdu?usp=sharing
The dataset contains two parts :
- dataset/etherscan.zip: the solidity contracts crawled from Etherscan.
- dataset/ground-truth.zip: the ground-truth of function boundaries for given instrumented solc and contracts.
- He, J., Li, S., Wang, X., Cheung, S.C., Zhao, G. and Yang, J., 2023. Neural-FEBI: Accurate function identification in Ethereum Virtual Machine bytecode. Journal of Systems and Software. https://doi.org/10.1016/j.jss.2023.111627