Mestre Project

This project contains the artifacts used in the paper "Towards Accurate Recommendations of Merge Conflicts Resolution Strategies", published in the IST journal. Pre-print is available here.

In this paper, we propose MESTRE, a merge conflict resolution strategy recommender.

The complementary material to the paper can be found in the "complementary" folder.

The dataset can be obtained through the steps outlined below.

The scripts used to reproduce the study can be found in the "scripts" folder.

Dataset

There are two options for accessing the dataset used in this paper. You can either collect the data by yourself (takes a long time) or directly download the dataset files.

Collect by yourself

We assume you have access to the conflicts database used in this paper. The database information can be configured in the scripts/database.py file.

Reproduce the scripts in the following order:

Script	Input	Output	Description
extract_initial_dataset.py	Conflicts database	./data/INITIAL_DATASET.csv	Extracts a csv with conflicting chunks and some descriptive attributes.
concatenation_relabel.py	./data/INITIAL_DATASET.csv, Conflicts database	./data/LABELLED_DATASET.csv	Relabels the developerdecision from each chunk that used the Concatenation strategy.
clone_projects.py	./data/INITIAL_DATASET.csv	Repos folder	Clones all projects into the ./repos folder.
collect_chunk_authors.py	./data/INITIAL_DATASET.csv, Repos folder	./data/chunk_authors.csv	Extracts a csv with information about all authors that contributed to a conflicting chunk. Detailed information can be found in this link.
collect_attributes.py	./data/INITIAL_DATASET.csv, Repos folder	./data/collected_attributes1.csv	Extracts a csv with collected attributes from the conflicting chunks. Extracted attributes are described in this link.
execute_mac_tool.py	./data/INITIAL_DATASET.csv, Repos folder	Two csv files for each analyed repo, ./data/macTool_output.csv	Executes a modified version of the macTool to extract merge attributes. More info in this link.
collect_merge_type.py	./data/macTool_output.csv, Repos folder	./data/merge_types_data.csv	Extracts the merge commit message for each chunk merge commit, the merge branch message indicator, and the boolean attribute regarding the existence of multiple developers on each branch of the merge. More info in this link.
collect_attributes_db.py	./data/INITIAL_DATASET.csv, Conflicts database, Repos folder	./data/collected_attributes2.csv	Extracts a csv with collected attributes from the conflicting chunks that can be calculated from the data in the database. Extracted attributes are described in this link.
extract_author_self_conflict.py	./data/chunk_authors.csv	./data/authors_self_conflicts.csv	Extracts a csv with the calculated self_conflict_perc metric for each conflicting chunk.
assemble_dataset.py	./data/collected_attributes1.csv, ./data/collected_attributes2.csv, ./data/authors_self_conflicts.csv, ./data/merge_types_data.csv, ./data/macTool_output.csv	./data/dataset.csv	Combines all collected data from the previous scripts into a single csv.
select_projects.py	./data/LABELLED_DATASET, ./data/number_conflicting_chunks.csv, ./data/dataset.csv	./data/selected_dataset.csv, ./data/SELECTED_LABELLED_DATASET.csv, ./data/projects_intersection.csv	Extracts only the conflicting chunks that satisfy the criteria contained in the script (currently chunks from projects that have at least 1,000 conflicting chunks, and that are not implicit forks from other selected projects).
github_api_data_preprocess.py	./data/number_conflicting_chunks.csv, ./data/number_chunks__updated_repos.csv, ./data/projects_data_from_github_api.csv	./data/api_data.csv	This script joins the data about projects (collected from GitHub API) with the data of the number of chunks per project (extracted from Ghiotto's database) and the data of the new owner/names of the projects, as well the projects not found by the API.
transform_boolean_attributes.py	./data/selected_dataset.csv	./data/selected_dataset2.csv	Transforms the language construct column in each conflicting chunk into a boolean attribute.
process_projects_dataset.py	./data/selected_dataset2.csv, ./data/chunk_authors.csv	Two csv files (training/test) for each analyzed selected repository put into .data/projects, .data/dataset-training.csv, .data/dataset-test.csv	Splits the dataset into training/validation (80%) and test (20%) parts. Creates the boolean attribute for authors in each selected project. Details can be viewed in this link
discretize_dataset.py	./data/dataset-training.csv, ./data/dataset-test.csv, ./data/projects/{project}-training.csv, ./data/projects/{project}-test.csv	Two csv files (training/test) for each analyzed selected repository put into .data/projects/discretized_log2 and .data/projects/discretized_log10, .data/dataset-training_log2.csv, .data/dataset-training_log10.csv, .data/dataset-test_log2.csv, .data/dataset-test_log10.csv	Discretizes categorical attributes from the dataset using log2 and log10 functions.

Download dataset:

Execute the script download_dataset_files.py. All data files will be put into the ./data folder.

Authors

Paulo Elias

Heleno de S. Campos Junior

Eduardo Ogasawara

Leonardo Gresta Paulino Murta

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
complementary		complementary
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mestre Project

Dataset

Collect by yourself

Download dataset:

Authors

About

Releases

Packages

Contributors 3

Languages

License

gems-uff/mestre

Folders and files

Latest commit

History

Repository files navigation

Mestre Project

Dataset

Collect by yourself

Download dataset:

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages