The project structure is as follows:
data
: Contains the dataset in json format.src
: Contains the source code for the project.requirements.txt
: Contains the required python packages for the project.README.md
: Contains the project documentation.
The refined Bench4BL dataset used for this project is provided in the data
directory in json format. The dataset contains information about the location of the bus stops in the city of Bengaluru. The dataset contains the following fields:
bug_id
: Unique identifier for the bug.bug_title
: Title of the bug.bug_description
: Description of the bug.project
: Project to which the bug belongs.sub_project
: Subject to which the bug belongs.version
: Version of the project.fixed_version
: Version in which the bug was fixed.fixed_files
: Files in which the bug was fixed as a json array.
The following are the pre-requisites for the project:
- Python 3.10
- Elasticsearch
- NVIDIA CUDA enabled GPU
- Required Python Packages
We recommend using a virtual environment to install the packages and run the application. Learn to use a virtual environment here.
-
Download Python 3.10:
- Visit python.org/downloads
- Download the Windows installer (
Windows Installer (64-bit)
recommended). - Run the installer.
- Check the box to add Python to PATH during installation.
-
Verify Installation:
- Open Command Prompt.
- Type
python --version
. - You should see
Python 3.10.x
.
-
Install Python 3.10:
- Open Terminal.
- Run the following commands:
sudo apt update sudo apt install python3.10
-
Verify Installation:
- Type
python3.10 --version
. - You should see
Python 3.10.x
.
- Type
PyTorch with CUDA 11.3 support is required for the project.
Use the following command to install PyTorch with CUDA support:
pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0+cu113 torchtext==0.11.0 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Verify the installation by running the following command:
python -c "import torch; print(torch.cuda.is_available())"
You should see True
if PyTorch is installed correctly with CUDA support.
If you do not have a CUDA-enabled GPU, install the CPU version of PyTorch. Learn more about PyTorch with CUDA support here.
-
Download Elasticsearch:
- Visit elastic.co/downloads/elasticsearch.
- Download the ZIP package for Windows.
-
Extract and Start Elasticsearch:
- Extract the downloaded ZIP file.
- Navigate to the extracted directory.
- Run
bin\elasticsearch.bat
in Command Prompt.
-
Verify Installation:
- Open a web browser.
- Go to http://localhost:9200.
- Check for a JSON response indicating Elasticsearch is running.
-
Download and Install Elasticsearch:
- Open Terminal.
- Run the following commands:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-<version>-amd64.deb sudo dpkg -i elasticsearch-<version>-amd64.deb
-
Start Elasticsearch Service:
- Run:
sudo systemctl start elasticsearch sudo systemctl enable elasticsearch
- Run:
-
Verify Installation:
- Open a web browser.
- Go to http://localhost:9200.
- Ensure Elasticsearch is running by checking for a JSON response.
-
Navigate to Project Directory:
- Open terminal/command prompt.
- Use
cd
to move to the directory containingrequirements.txt
.
-
Install Packages:
- Run
pip install -r requirements.txt
.
- Run
- Create Index:
- Run 'src/IR/Indexer/Index_Creator.py' to create an index in Elasticsearch. The configuration for the index is provided in 'IR_Config.yaml'.
- Extract the source files from Git Projects per version and index them in Elasticsearch using 'Indexer.py'. The GitHub Repositories are listed in the Bench4BL repository.
- The default port for Elasticsearch is 9200.
- Train or download the models from the following link:
Run the command below to localize the bugs:
python src --br-path /path/to/input/data --kw-model-dir /path/to/keyword/model --ce-model-dir /path/to/cross-encoder/model --L 10 --topK_rerank 50 --topN 10
- `--br-path`: Path to the input data in json format. The format of the json file should follow the format of the dataset provided in the `data` directory.
- `--kw-model-dir`: Path to the keyword model.
- `--ce-model-dir`: Path to the cross-encoder model.
- `--L`: Length of the keywords.
- `--topK_rerank`: Number of bugs to rerank.
- `--topN`: Number of top outputs to return.