Explaining Top-K Matching

Shapley Values for Explanation in Two-sided Matching Applications

Abstract

In this paper, we initiate research in explaining matchings. In particular, we consider the large-scale two-sided matching applications where preferences of the users are specified as (ranking) functions over a set of attributes and matching recommendations are derived as top-k. We consider multiple natural explanation questions, concerning the users of these systems. Observing the competitive nature of these environments, we propose multiple Shapley-based approaches for explanation. Besides exact algorithms, we propose a sampling-based approximation algorithm with provable guarantees to overcome the combinatorial complexity of the exact Shapley computation. Our extensive experiments on real-world and synthetic data sets validate the usefulness of our proposal and confirm the efficiency and accuracy of our algorithms.

Publication to Cite

Paper currently under review.

Getting Started

These instructions will get you a copy of the project up and running on your local machine, and allow you to rerun the experiments from the paper.

Python 3.10

Using the Queries and Methods

First import the experiments module. The parameters are defined as follow.

tuples - an array of n arrays of d integers
evalFunc - a function on array of d integers where inputs can potentially be "None" producing a positive value
evalFuncs - if a query requires more than one evalFunc, they are passed in as an array of functions
k - the number of individuals in a given top-k
j - the target tuple index on which to perform the query "Why was $t_j$ in the top k of the evaluator?"
d - the total number of shapley values to compute
unWrapFunction - assuming the shapley values are being calculated over groups of attributes, this function takes an array of group indexes, and returns the full list of associated attributes
m - for the approximate method, this is the number of permutations sampled. note that it will evaluate once for each index in the permutation
bruteForce - if bruteForce has been run on the dataset, include bruteForce to receive the Standard Deviation and Average Difference between the methods

Additionally, the SHAP methods will use a model parameter. To generate a model, first import ModelGenerator from model.py, and then call the following builder methods.

database - the same as tuples before
eval_func - the same as evalFunc before
eval_funcs - the same as evalFuncs before
k - the same as k before
target - the same as j before
attributes - the same as d before
unwrap_func - the same as unWrapFunc before

Finally, once all data has been given to the model, the following methods must be called to prepare the data for specific queries:

For SQ-Single, call the function "setup_top_k" to construct the top-k for the queried individual's evaluation function.
For SQ-Multiple, call the function "setup_top_ks" to construct the top-k for all individuals' evaluation functions.

Running the Experiments

Real World Dataset - Varying M on Brute Force, Approximate, and SHAP. Call python3 candidates_set_experiment.py
Synthetic Dataset - Varying M on Brute Force, Approximate, and SHAP. Call python3 varying_m_experiment.py [DISTRIBUTIONS] where distributions are space separated values in AZL (Anti-correlated Zipfian Linear), CZL (Correlated Zipfian Linear), IZL (Independent Zipfian Linear), AZNL (Anti-correlated Zipfian Non-Linear), CZNL (Correlated Zipfian Non-Linear), IZNL (Independent Zipfian Non-Linear).
Synthetic Dataset - Varying D on Brute Force, Approximate, and SHAP. Call python3 varying_d_experiment.py [DISTRIBUTIONS] where distributions are space separated values in AZL (Anti-correlated Zipfian Linear), CZL (Correlated Zipfian Linear), IZL (Independent Zipfian Linear), AZNL (Anti-correlated Zipfian Non-Linear), CZNL (Correlated Zipfian Non-Linear), IZNL (Independent Zipfian Non-Linear).
Case Study - Call python3 case_study_experiment.py
Top Attribute Experiment - Determining how often the top value of the method is the same as brute force for Approximate, Weights, and Attribute Score. Call python3 top_attribute_experiment.py [DISTRIBUTIONS] where distributions are space separated values in AZL (Anti-correlated Zipfian Linear), CZL (Correlated Zipfian Linear), IZL (Independent Zipfian Linear), AZNL (Anti-correlated Zipfian Non-Linear), CZNL (Correlated Zipfian Non-Linear), IZNL (Independent Zipfian Non-Linear), .

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
__pycache__		__pycache__
data		data
explain_US		explain_US
results		results
results2		results2
results3		results3
results4		results4
results5		results5
1000-8-samples.dill		1000-8-samples.dill
100x100-5-samples		100x100-5-samples
Appendix.pdf		Appendix.pdf
Candidate-Function-Weights.dill		Candidate-Function-Weights.dill
Candidates-Dataset.dill		Candidates-Dataset.dill
Candidates-Functions.dill		Candidates-Functions.dill
Candidates-Values.pickle		Candidates-Values.pickle
Candidates22ApproximateReslts.dill		Candidates22ApproximateReslts.dill
ExperimentD-NLResult.dill		ExperimentD-NLResult.dill
ExperimentDResults.dill		ExperimentDResults.dill
ExperimentM-NLResult.dill		ExperimentM-NLResult.dill
ExperimentMResults.dill		ExperimentMResults.dill
ExperimentMResults7.dill		ExperimentMResults7.dill
ExperimentMResults8.dill		ExperimentMResults8.dill
GenerateCandidatesFunctions.py		GenerateCandidatesFunctions.py
MultipleSamplesExperiment2		MultipleSamplesExperiment2
RE-Dataset.dill		RE-Dataset.dill
RE-Function-Weights.dill		RE-Function-Weights.dill
README.md		README.md
RemoveAttributeExperiments.dill		RemoveAttributeExperiments.dill
Removing-Functions-Linear.dill		Removing-Functions-Linear.dill
Removing-Functions-Nonlinear.dill		Removing-Functions-Nonlinear.dill
Running-Example-Function-Weights.dill		Running-Example-Function-Weights.dill
Running-Example-Functions.dill		Running-Example-Functions.dill
Running-Example-Results.pickle		Running-Example-Results.pickle
Running-Example.pickle		Running-Example.pickle
SyntheticAZL-Clean.dill		SyntheticAZL-Clean.dill
SyntheticAZNL-Clean.dill		SyntheticAZNL-Clean.dill
SyntheticCZL-Clean.dill		SyntheticCZL-Clean.dill
SyntheticCZNL-Clean.dill		SyntheticCZNL-Clean.dill
SyntheticDAZL.dill		SyntheticDAZL.dill
SyntheticDAZNL.dill		SyntheticDAZNL.dill
SyntheticDCZL.dill		SyntheticDCZL.dill
SyntheticDCZNL.dill		SyntheticDCZNL.dill
SyntheticDIZL.dill		SyntheticDIZL.dill
SyntheticDIZNL.dill		SyntheticDIZNL.dill
SyntheticIZL-Clean.dill		SyntheticIZL-Clean.dill
SyntheticIZNL-Clean.dill		SyntheticIZNL-Clean.dill
SyntheticMAZL-Final.dill		SyntheticMAZL-Final.dill
SyntheticMAZL.dill		SyntheticMAZL.dill
SyntheticMAZNL-Final.dill		SyntheticMAZNL-Final.dill
SyntheticMAZNL.dill		SyntheticMAZNL.dill
SyntheticMCZL-Final.dill		SyntheticMCZL-Final.dill
SyntheticMCZL.dill		SyntheticMCZL.dill
SyntheticMCZNL-Final.dill		SyntheticMCZNL-Final.dill
SyntheticMCZNL.dill		SyntheticMCZNL.dill
SyntheticMIZL-Final.dill		SyntheticMIZL-Final.dill
SyntheticMIZL.dill		SyntheticMIZL.dill
SyntheticMIZNL-Final.dill		SyntheticMIZNL-Final.dill
SyntheticMIZNL.dill		SyntheticMIZNL.dill
UpdatedExperimentDResults.dill		UpdatedExperimentDResults.dill
UpdatedExperimentDResults2.dill		UpdatedExperimentDResults2.dill
UpdatedExperimentDResults3.dill		UpdatedExperimentDResults3.dill
UpdatedExperimentDResults4.dill		UpdatedExperimentDResults4.dill
UpdatedExperimentMResults.dill		UpdatedExperimentMResults.dill
Varying-D.dill		Varying-D.dill
VaryingMCandidates-Clean.dill		VaryingMCandidates-Clean.dill
VaryingMCandidates.dill		VaryingMCandidates.dill
VaryingMCandidatesFinal		VaryingMCandidatesFinal
VaryingMCandidatesRevised.dill		VaryingMCandidatesRevised.dill
Varying_samples_SHAP.py		Varying_samples_SHAP.py
approximate.py		approximate.py
bruteforce.py		bruteforce.py
candidates_set_experiment.py		candidates_set_experiment.py
case_study_experiment.py		case_study_experiment.py
datagen.py		datagen.py
experiments.py		experiments.py
experiments.py.save		experiments.py.save
generatesamples.py		generatesamples.py
model.py		model.py
printVariables.py		printVariables.py
printVariablesErrorCandidates.py		printVariablesErrorCandidates.py
running_example_experiment.py		running_example_experiment.py
shap_admissions.py		shap_admissions.py
shap_bipartite.py		shap_bipartite.py
shaprunner.py		shaprunner.py
top_attribute_experiment.py		top_attribute_experiment.py
topk.py		topk.py
utility.py		utility.py
varying_d_experiment.py		varying_d_experiment.py
varying_m_experiment.py		varying_m_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explaining Top-K Matching

Shapley Values for Explanation in Two-sided Matching Applications

Abstract

Publication to Cite

Getting Started

Using the Queries and Methods

Running the Experiments

About

Releases

Packages

Contributors 3

Languages

UIC-InDeXLab/ExplainMatchTopK

Folders and files

Latest commit

History

Repository files navigation

Explaining Top-K Matching

Shapley Values for Explanation in Two-sided Matching Applications

Abstract

Publication to Cite

Getting Started

Using the Queries and Methods

Running the Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages