Skip to content
/ certa Public

CERTA - Computing Entity Resolution explanations with TriAngles

License

Notifications You must be signed in to change notification settings

tteofili/certa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CERTA

Code for CERTA (Computing ER explanations with TriAngles), an algorithm for computing saliency and counterfactual explanations for Entity Resolution models.

Installation

To install CERTA locally run :

pip install .

Usage

Wrap the model whose predictions need to be explained using the ERModel interface. The get_model utility method will load an existing model, if available, or train a new one using the data in the provided dataset. E.g. for a DeepMatcher model use:

from certa.models.utils import get_model

model = get_model('dm', '/path/where/to/save', '/path/to/dataset', 'modelname')

Define a prediction function wrapping the model.predict() method.

def predict_fn(x, **kwargs):
    return model.predict(x, **kwargs)

Create a CertaExplainer. CERTA needs access to the data sources lsource and rsource.

import pandas as pd
from certa.explain import CertaExplainer

lsource = pd.read_csv('/path/to/dataset/tableA.csv')
rsource = pd.read_csv('/path/to/dataset/tableB.csv')
certa_explainer = CertaExplainer(lsource, rsource)

To generate the prediction for the first two records in the data sources, do the following:

import numpy as np
from certa.local_explain import get_original_prediction

l_tuple = lsource.iloc[0]
r_tuple = rsource.iloc[0]
prediction = get_original_prediction(l_tuple, r_tuple, predict_fn)
class_to_explain = np.argmax(prediction)

To explain the prediction using CERTA :

saliency, summary, cfs, triangles, lattices = certa_explainer.explain(l_tuple, r_tuple, predict_fn)

CERTA returns:

  • the saliency explanation within the saliency pd.DataFrame
  • a summary containing the set of attributes that has the highest probability of sufficiency of flipping the original prediction
  • the generated counterfactual explanations within the cfs pd.DataFrame
  • the list of open triangles (in form of tuples of record ids) used to generate the explanations

Examples

Examples of using CERTA can be found in the following notebooks:

Citing CERTA

If you extend or use this work, please cite the paper:

@article{teofili2022effective,
  title={Effective Explanations for Entity Resolution Models},
  author={Teofili, Tommaso and Firmani, Donatella and Koudas, Nick and Martello, Vincenzo and Merialdo, Paolo and Srivastava, Divesh},
  journal={arXiv preprint arXiv:2203.12978},
  year={2022}
}