Experiments Replication

This section is written based on our setup experience on Red Hat's Linux 7.5 and CentOS Linux release 7.5.1804. Please run on a Linux OS. It is strongly recommended to run with at least one GPU.

Before setting up our project we'd like to make sure you have some prerequisite installations and setups.

Step 1: Python Setup

Ensure you have Anaconda3 installed, if not install Python 3.7 from Anaconda with the following steps:

Install the list of dependencies described here
Download the installer here. For example, you can use the wget command: wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh, then type chmod +x Anaconda3-2021.05-Linux-x86_64.sh and run bash Anaconda3-2021.05-Linux-x86_64.sh to complete the installation.
You may need to add anaconda directory to the PATH environment variable (e.g., you can add export PATH="/path_to_anaconda/anaconda3/bin:$PATH" to the bashrc file).

Recommended GPU Set up

If you are using an HPC cluster run the following command to enable Python 3.7 with CUDA: module load cuda/9.2 anaconda3/5.0.1-cuda92
If you are using a local machine and have anaconda set up already run the following command conda env create -f evil_env_gpu.yml Upon completion activate it using conda activate evil_env. Alternatively to using our environment file, you can run conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia followed by pip3 install -r requirements_gpu.txt --user. After this, you can move to the Install Natural Language tools section.

Step 2: Dependencies Setup

Move to the EVIL main directory
It is recommended you use a virtual environment for the dependency set up (Conda environment). If you do not wish to do so, then simply run pip3 install -r requirements.txt --user.

Setting up a Conda environment

Import our saved conda environment using the command: conda env create -f evil_env.yml and activate it using source activate evil_env or conda activate evil_env
Alternatively, you can create an anaconda Python 3.7 virtual environment using the command conda create -n yourenvname python=3.7 anaconda. Activate the environment by typing source activate yourenvname.
Run pip3 install -r requirements.txt --user to install the dependencies.

Install Natural Language tools

Install nltk tokenizers and corpora python -m nltk.downloader, then type d (Download), and type all in Identifier. Type q at the end of the installation.
Install the spacy language model by using the following command python -m spacy download en_core_web_lg

Step 3: Running Experiments

This section briefly describes how to replicate the experiment mentioned in the paper. If you are using an anaconda environment, please ensure that your conda environment is activated before running any of the bash commands below.

CodeBERT

To Launch the finetuning and evaluation processes of CodeBERT the basic command template is as follows:
bash CodeBERT_Launch.sh [DEVICE] [DATASET] [PREPROCESSING]

Device Options:

Local machine
HPC with a SLURM scheduler
HPC with a TORQUE scheduler

Dataset Options:

Python Encoder Dataset
Assembly Decoder Dataset

Preprocessing Options:

Raw corpus counts
Preprocessing without the Intent Parser (IP)
Preprocessing with the Intent Parser (IP)

Running on a local machine

From the EVIL home directory, run bash CodeBERT_Launch.sh 0 [DATASET] [PREPROCESSING]

Running on an HPC with a SLURM scheduler

Navigate to EVIL/model/fine_tune.slurm and add in your GPU queue name under the TODO comment.
From the EVIL home directory, run bash CodeBERT_Launch.sh 1 [DATASET] [PREPROCESSING]
When the job is complete, from the EVIL home directory, run bash evaluate.sh
Note: If your cluster jobs do not connect to the internet you might want to run the bash script on the head node using the local machine option bash CodeBERT_Launch.sh 0 [DATASET] [PREPROCESSING] to download the models and terminate it before it gets to the training portion (you'll see a progress bar when right before the training starts).

Running on an HPC with a TORQUE scheduler

Navigate to EVIL/model/fine_tune.pbs and add in your GPU queue name under the TODO comment.
From the EVIL home directory, run bash CodeBERT_Launch.sh 2 [DATASET] [PREPROCESSING]
When the job is complete, from the EVIL home directory, run bash evaluate.sh
Note: If your cluster jobs do not connect to the internet you might want to run the bash script on the head node using the local machine option bash CodeBERT_Launch.sh 0 [DATASET] [PREPROCESSING] to download the models and terminate it before it gets to the training portion (you'll see a progress bar when right before the training starts).

Final Results

The final evaluation results would appear on your console if you are running on your local machine and in the specified logging output directory if a job was submitted. The predicted output will be generated in the subdirectory model/eval/[encoder/decoder]_test_output.json.

Seq2Seq

To launch the training and evaluation of the Seq2Seq model mentioned in the paper also ensure the conda environment is active. The basic command template is as follows:
bash Seq2Seq_Launch.sh [DATASET] [PREPROCESSING]
The dataset and preprocessing options are the same as that of CodeBERT.

Final Results

The final evaluation results would appear on your console if you are running on your local machine and in the specified logging output directory seq2seq/logs The predicted output will be generated in the subdirectory seq2seq/archive/id-[timestamp]/answer_[encoder/decoder].txt.

Notes

Run bash utils/test_split.sh for details on the different preprocessing options
If you chose to submit a job, the logs will be stored in model/job_logs/, named with the job id.
Run bash utils/test_split.sh [DATASET] 0 for raw corpus token counts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL.md

INSTALL.md

Experiments Replication

Step 1: Python Setup

Recommended GPU Set up

Step 2: Dependencies Setup

Setting up a Conda environment

Install Natural Language tools

Step 3: Running Experiments

CodeBERT

Running on a local machine

Running on an HPC with a SLURM scheduler

Running on an HPC with a TORQUE scheduler

Final Results

Seq2Seq

Final Results

Notes

Files

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

Experiments Replication

Step 1: Python Setup

Recommended GPU Set up

Step 2: Dependencies Setup

Setting up a Conda environment

Install Natural Language tools

Step 3: Running Experiments

CodeBERT

Running on a local machine

Running on an HPC with a SLURM scheduler

Running on an HPC with a TORQUE scheduler

Final Results

Seq2Seq

Final Results

Notes