Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs

About

This repository contains the implementation, data, evaluation scripts, and results as described in the manuscript Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs.

Setup

Setup a virtual env and then run the following command in the project root folder:

pip install .

How to organize the data folder of the project

Create a datafolder on the same level, i.e., in the same directory, as the project root folder. The folder structure should look like this:

├── data
│   ├── eval              <- Optional, if you want to use the evaluation.py script to run the full evaluation.
│   ├── logs              <- Event logs you want to use.
│   ├── interim           <- Intermediate data that has been preprocessed.
│   ├── output            <- Folder to save results in.
│   └── raw               <- The raw dataset should be placed in this folder.

The full process model dataset (a subset of that was used in our experiments) can be downloaded from here: dataset; place it into the folder ./data/raw such that the models are in ./data/raw/sap_sam_2022/models. Note that for the evaluation experiments we created a filtered version of the dataset, which is included in project_root/eval_data.

After installing the package you need to download the Spacy language model for English and nltk wordnet and stopwords

python -m spacy download en_core_web_sm

python -m nltk.downloader stopwords

python -m nltk.downloader wordnet

Usage

Place the event logs you want to use in the data/logs folder.
Adapt the CURRENT_LOG_FILE variable in main.py to the name of the log file you want to use.
Make sure thr name of the model collection you want to use is set in the MODEL_COLLECTION variable in main.py.
Run the main.py script

Evaluation

Results from the paper and additional results

The results reported in the paper can be obtained using a provided Python notebook (notebooks/evaluation_results.pynb). It also contains additional results that we could not include in the paper due to space reasons.

Reproduce

Proceed as explained in this section to reproduce the results from scratch.

Data

As explained above the data used must be stored in the data folder data/raw. We provide the data used in the experiments in project_root/eval_data.

Model collection: Move the entire semantic_sap_sam_filtered folder inside project_root/eval_data to data/raw to use it.
Noisy logs: Move the semantic_sap_sam_filtered_noisy.pkl file inside project_root/eval_data to data/eval to use it.

Run the experiments

The experiments can be run using the evaluate.py script. There are many configuration options that can be defined in the eval_configurations dictionary in the script. There are already some examples in there.

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
eval_data		eval_data
notebooks		notebooks
semconstmining		semconstmining
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs

About

Setup

How to organize the data folder of the project

Usage

Evaluation

Results from the paper and additional results

Reproduce

Data

Run the experiments

About

Releases

Packages

Contributors 2

Languages

License

a-rebmann/semantic-constraint-miner

Folders and files

Latest commit

History

Repository files navigation

Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs

About

Setup

How to organize the data folder of the project

Usage

Evaluation

Results from the paper and additional results

Reproduce

Data

Run the experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages