Skip to content

This project is a companion website of the paper "Towards Accurate Recommendations of Merge Conflicts Resolution Strategies", submitted to the IST journal.

License

Notifications You must be signed in to change notification settings

gems-uff/mestre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mestre Project

This project contains the artifacts used in the paper "Towards Accurate Recommendations of Merge Conflicts Resolution Strategies", published in the IST journal. Pre-print is available here.

In this paper, we propose MESTRE, a merge conflict resolution strategy recommender.

The complementary material to the paper can be found in the "complementary" folder.

The dataset can be obtained through the steps outlined below.

The scripts used to reproduce the study can be found in the "scripts" folder.

Dataset

There are two options for accessing the dataset used in this paper. You can either collect the data by yourself (takes a long time) or directly download the dataset files.

Collect by yourself

We assume you have access to the conflicts database used in this paper. The database information can be configured in the scripts/database.py file.

Reproduce the scripts in the following order:

Script Input Output Description
extract_initial_dataset.py Conflicts database ./data/INITIAL_DATASET.csv Extracts a csv with conflicting chunks and some descriptive attributes.
concatenation_relabel.py ./data/INITIAL_DATASET.csv, Conflicts database ./data/LABELLED_DATASET.csv Relabels the developerdecision from each chunk that used the Concatenation strategy.
clone_projects.py ./data/INITIAL_DATASET.csv Repos folder Clones all projects into the ./repos folder.
collect_chunk_authors.py ./data/INITIAL_DATASET.csv, Repos folder ./data/chunk_authors.csv Extracts a csv with information about all authors that contributed to a conflicting chunk. Detailed information can be found in this link.
collect_attributes.py ./data/INITIAL_DATASET.csv, Repos folder ./data/collected_attributes1.csv Extracts a csv with collected attributes from the conflicting chunks. Extracted attributes are described in this link.
execute_mac_tool.py ./data/INITIAL_DATASET.csv, Repos folder Two csv files for each analyed repo, ./data/macTool_output.csv Executes a modified version of the macTool to extract merge attributes. More info in this link.
collect_merge_type.py ./data/macTool_output.csv, Repos folder ./data/merge_types_data.csv Extracts the merge commit message for each chunk merge commit, the merge branch message indicator, and the boolean attribute regarding the existence of multiple developers on each branch of the merge. More info in this link.
collect_attributes_db.py ./data/INITIAL_DATASET.csv, Conflicts database, Repos folder ./data/collected_attributes2.csv Extracts a csv with collected attributes from the conflicting chunks that can be calculated from the data in the database. Extracted attributes are described in this link.
extract_author_self_conflict.py ./data/chunk_authors.csv ./data/authors_self_conflicts.csv Extracts a csv with the calculated self_conflict_perc metric for each conflicting chunk.
assemble_dataset.py ./data/collected_attributes1.csv, ./data/collected_attributes2.csv, ./data/authors_self_conflicts.csv, ./data/merge_types_data.csv, ./data/macTool_output.csv ./data/dataset.csv Combines all collected data from the previous scripts into a single csv.
select_projects.py ./data/LABELLED_DATASET, ./data/number_conflicting_chunks.csv, ./data/dataset.csv ./data/selected_dataset.csv, ./data/SELECTED_LABELLED_DATASET.csv, ./data/projects_intersection.csv Extracts only the conflicting chunks that satisfy the criteria contained in the script (currently chunks from projects that have at least 1,000 conflicting chunks, and that are not implicit forks from other selected projects).
github_api_data_preprocess.py ./data/number_conflicting_chunks.csv, ./data/number_chunks__updated_repos.csv, ./data/projects_data_from_github_api.csv ./data/api_data.csv This script joins the data about projects (collected from GitHub API) with the data of the number of chunks per project (extracted from Ghiotto's database) and the data of the new owner/names of the projects, as well the projects not found by the API.
transform_boolean_attributes.py ./data/selected_dataset.csv ./data/selected_dataset2.csv Transforms the language construct column in each conflicting chunk into a boolean attribute.
process_projects_dataset.py ./data/selected_dataset2.csv, ./data/chunk_authors.csv Two csv files (training/test) for each analyzed selected repository put into .data/projects, .data/dataset-training.csv, .data/dataset-test.csv Splits the dataset into training/validation (80%) and test (20%) parts. Creates the boolean attribute for authors in each selected project. Details can be viewed in this link
discretize_dataset.py ./data/dataset-training.csv, ./data/dataset-test.csv, ./data/projects/{project}-training.csv, ./data/projects/{project}-test.csv Two csv files (training/test) for each analyzed selected repository put into .data/projects/discretized_log2 and .data/projects/discretized_log10, .data/dataset-training_log2.csv, .data/dataset-training_log10.csv, .data/dataset-test_log2.csv, .data/dataset-test_log10.csv Discretizes categorical attributes from the dataset using log2 and log10 functions.

Download dataset:

Execute the script download_dataset_files.py. All data files will be put into the ./data folder.

Authors

Paulo Elias

Heleno de S. Campos Junior

Eduardo Ogasawara

Leonardo Gresta Paulino Murta

About

This project is a companion website of the paper "Towards Accurate Recommendations of Merge Conflicts Resolution Strategies", submitted to the IST journal.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •