Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs
This repository contains the implementation, data, evaluation scripts, and results as described in the manuscript Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs.
Setup a virtual env and then run the following command in the project root folder:
pip install .
Create a datafolder on the same level, i.e., in the same directory, as the project root folder. The folder structure should look like this:
├── data
│ ├── eval <- Optional, if you want to use the evaluation.py script to run the full evaluation.
│ ├── logs <- Event logs you want to use.
│ ├── interim <- Intermediate data that has been preprocessed.
│ ├── output <- Folder to save results in.
│ └── raw <- The raw dataset should be placed in this folder.
The full process model dataset (a subset of that was used in our experiments) can be downloaded from here: dataset; place it into the folder ./data/raw
such that the models are in ./data/raw/sap_sam_2022/models
. Note that for the evaluation experiments we created a
filtered version of the dataset, which is included in project_root/eval_data
.
After installing the package you need to download the Spacy language model for English and nltk wordnet and stopwords
python -m spacy download en_core_web_sm
python -m nltk.downloader stopwords
python -m nltk.downloader wordnet
- Place the event logs you want to use in the
data/logs
folder. - Adapt the CURRENT_LOG_FILE variable in
main.py
to the name of the log file you want to use. - Make sure thr name of the model collection you want to use is set in the
MODEL_COLLECTION
variable inmain.py
. - Run the
main.py
script
The results reported in the paper can be obtained using a provided Python notebook (notebooks/evaluation_results.pynb
).
It also contains additional results that we could not include in the paper due to space reasons.
Proceed as explained in this section to reproduce the results from scratch.
As explained above the data used must be stored in the data folder data/raw
.
We provide the data used in the experiments in project_root/eval_data
.
- Model collection: Move the entire
semantic_sap_sam_filtered
folder insideproject_root/eval_data
todata/raw
to use it. - Noisy logs: Move the
semantic_sap_sam_filtered_noisy.pkl
file insideproject_root/eval_data
todata/eval
to use it.
The experiments can be run using the evaluate.py
script.
There are many configuration options that can be defined in the eval_configurations
dictionary in the script. There are already some examples in there.