Code for CERTA (Computing ER explanations with TriAngles), an algorithm for computing saliency and counterfactual explanations for Entity Resolution models.
To install CERTA locally run :
pip install .
Wrap the model whose predictions need to be explained using the ERModel interface. The get_model utility method will load an existing model, if available, or train a new one using the data in the provided dataset. E.g. for a DeepMatcher model use:
from certa.models.utils import get_model
model = get_model('dm', '/path/where/to/save', '/path/to/dataset', 'modelname')
Define a prediction function wrapping the model.predict() method.
def predict_fn(x, **kwargs):
return model.predict(x, **kwargs)
Create a CertaExplainer. CERTA needs access to the data sources lsource and rsource.
import pandas as pd
from certa.explain import CertaExplainer
lsource = pd.read_csv('/path/to/dataset/tableA.csv')
rsource = pd.read_csv('/path/to/dataset/tableB.csv')
certa_explainer = CertaExplainer(lsource, rsource)
To generate the prediction for the first two records in the data sources, do the following:
import numpy as np
from certa.local_explain import get_original_prediction
l_tuple = lsource.iloc[0]
r_tuple = rsource.iloc[0]
prediction = get_original_prediction(l_tuple, r_tuple, predict_fn)
class_to_explain = np.argmax(prediction)
To explain the prediction using CERTA :
saliency, summary, cfs, triangles, lattices = certa_explainer.explain(l_tuple, r_tuple, predict_fn)
CERTA returns:
- the saliency explanation within the saliency pd.DataFrame
- a summary containing the set of attributes that has the highest probability of sufficiency of flipping the original prediction
- the generated counterfactual explanations within the cfs pd.DataFrame
- the list of open triangles (in form of tuples of record ids) used to generate the explanations
Examples of using CERTA can be found in the following notebooks:
If you extend or use this work, please cite the paper:
@article{teofili2022effective,
title={Effective Explanations for Entity Resolution Models},
author={Teofili, Tommaso and Firmani, Donatella and Koudas, Nick and Martello, Vincenzo and Merialdo, Paolo and Srivastava, Divesh},
journal={arXiv preprint arXiv:2203.12978},
year={2022}
}