Fairness Metrics #5093

ArjunSubramonian · 2021-04-02T23:54:43Z

Changes proposed in this pull request:

Created the fairness module and added four fairness metrics: Independence, Separation, Sufficiency, and Demographic Parity Without Ground Truth.

AkshitaB

@ArjunSubramonian I've left some comments. The main one is regarding the metrics working for different batches. When we run the trainer, we accumulate partial info for every batch, and compute the actual metric value for each epoch (and then reset). We want something similar here. Let me know if that makes sense, or if you would like to discuss it further.

AkshitaB · 2021-04-08T21:12:07Z

allennlp/fairness/fairness_metrics.py

+"""
+Fairness metrics are based on:
+1) Barocas, S.; Hardt, M.; and Narayanan, A. 2019. Fairness and machine learning. fairmlbook.org.
+2) Zhang, B. H.; Lemoine, B.; and Mitchell, M. 2018. Mitigating unwanted biases with adversarial learning.


Minor comment: it'll be nice to have links to the papers too, similar to this in markdown format. Then, anyone reading the documentation can simply click on it.

Also, we like semantic scholar links. 😄

AkshitaB · 2021-04-08T21:49:51Z

allennlp/fairness/fairness_metrics.py

+
+        predicted_labels : `torch.Tensor`, required.
+            A tensor of predicted integer class labels of shape (batch_size, ...). Represented as C.
+        protected_variable_labels : `torch.Tensor`, required.


So, if I understand correctly, the protected_variable_labels will correspond to a single protected attribute, such as race, right? If we want to measure independence with different protected categories, say, race, gender, sexuality, then we setup a different Independence metric for each of them?

Yes, that's correct

AkshitaB · 2021-04-08T21:56:51Z

allennlp/fairness/fairness_metrics.py

+                .histc(bins=num_classes, min=0, max=num_classes - 1)
+                / predicted_labels.nelement()
+            )
+            kl_divs[a] = kl_divergence(C_given_a_dist, C_dist)


These metrics do not support the distributed setting currently. This is fine, but we should add a check similar to this so that users know.

You can also take a look at this and this for examples of metrics working in distributed settings.

Also, this will compute the distributions correctly only for a single batch, right? We want that the distributions are computed based on all batches. (Again, you can take a look at other metrics for how this works). Ideally, the __call__ function accumulates the distributions in some way, and the get_metric() function computes the final kl divergence. (I'm not sure how efficient this will be, though).

AkshitaB · 2021-04-08T22:01:25Z

allennlp/fairness/fairness_metrics.py

+@Metric.register("independence")
+class Independence(Metric):
+    """
+    Independence. Assumes integer labels, with


It might be helpful to add a small example of how the output can be interpreted in a real-world context. Eg. kl div of nearly 0 for some protected variable implies independence, which is useful in a certain context.

…umulate

…aration

…yWithoutGroundTruth

…allennlp into arjuns/fairness-metrics

epwalsh · 2021-04-14T23:39:39Z

allennlp/fairness/fairness_metrics.py

+
+            Note: all tensors are expected to be on the same device.


You could do format like this to make it look better on the docs site:

Suggested change

Note: all tensors are expected to be on the same device.

!!! Note

All tensors are expected to be on the same device.

Woah, this is so cool :D I just changed it everywhere

epwalsh · 2021-04-15T00:00:30Z

allennlp/fairness/fairness_metrics.py

+It is provably impossible to satisfy any two of Independence, Separation, and Sufficiency simultaneously,
+except in degenerate cases.


That's interesting. Do you have a reference for that?

Added a link in the docstring!

AkshitaB · 2021-04-16T01:31:33Z

allennlp/fairness/fairness_metrics.py

+        pmi_terms = {}
+        prob_y = torch.zeros(self._num_classes).to(device)
+        torch.div(self._y_counts, self._total_predictions, out=prob_y)
+        for x in range(self._num_protected_variable_labels):


I think it's possible to vectorize this part if joint_counts_by_protected_variable_label and _protected_variable_label_counts are initialized as tensors instead of dicts (they can still be filled using the for loop like now) - with a bit of reshaping.

Great suggestion :) I just did this for all the metrics.

* Added three definitions of fairness * Updated CHANGELOG * Added DemographicParityWithoutGroundTruth and finished tests * finished refactoring Independence, Separation, and Sufficiency to accumulate * added distributed functionality to Independence, Sufficiency, and Separation * Finished aggregate and distributed functionality for DemographicParityWithoutGroundTruth * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU issues * fixed GPU issues * added init file * fixed typo * minor docstring changes * minor changes to docstring * Added simple explanations of fairness metrics to docstrings * Further vectorized all metric implementations * Fixed device issue Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>

@matt-gardner

* Formatting * New activation functions * Makes position embeddings optional in the transformer embeddings * Adds T5 * Various fixes to make this start up * Share weights * Adds one test that passes, and one test that fails * use min_value_of_dtype in apply_mask * fixes, add beam search * encoder fixes * fix * fix beam search * fix tests * rename to just 'T5' * fix initialization from pretrained * add Model, DatasetReader, and Predictor * remove useless dataset reader * move high-level peices to allennlp-models * revert predictor changes * remove unneeded hidden_size * remove stray comment * bool masks * CHANGELOG * fix test file name * revert other change * revert other change * Distributed training with gradient accumulation (#5100) * Fixes distributed training with gradient accumulation * Fix in case we don't do anything in a batch group * Test for the problematic condition * Formatting * More formatting * Changelog * Fix another test * Fix even more tests * Fixes one more test * I can fix these tests all day. * Add link to gallery and demo in README (#5103) * Add link to gallery in README * Update README.md * try emojis Is this overkill? * Adding a metadata field to the basic classifier (#5104) * Adding metadata parameter to BasicClassifier * Fix * Updating the changelog * reformatting * updating parameter type * fixing import Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * additional W&B params (#5114) * additional W&B params * add wandb_kwargs * fix * fix docs * Add eval_mode argument to pretrained transformer embedder (#5111) * Add eval_mode argument to pretrained transformer embedder * Edit changelog entry * Lint * Update allennlp/modules/token_embedders/pretrained_transformer_embedder.py * Apply suggestions from code review Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> * specify 'truncation' to avoid transformers warning (#5120) * specify 'truncation' to avoid transformers warning * Update docs * Remove `stride` param * Update CHANGELOG.md Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * Predicting with a dataset reader on a multitask model (#5115) * Create a way to use allennlp predict with a dataset and a multitask model * Fix type ignoration * Changelog * Fix to the predictor * fix bug with interleaving dataset reader (#5122) * fix bug with interleaving dataset reader * more tests * Update allennlp/data/dataset_readers/interleaving_dataset_reader.py * Update allennlp/data/dataset_readers/interleaving_dataset_reader.py * remove jsonpickle from dependencies (#5121) Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * Update docstring for basic_classifier (#5124) * improve error message from Registrable class (#5125) Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> * Prepare for release v2.3.0 * fix docs CI * Take the number of runs in the test for distributed metrics (#5127) * Take the number of runs in the test for distributed metrics * Changelog * Add influence functions to interpret module (#4988) * creating a new functionality to fields and instances to support outputing instnaces to json files * creating tests for the new functionality * fixing docs * Delete __init__.py * Delete influence_interpreter.py * Delete use_if.py * Delete simple_influence_test.py * fixing docs * finishing up SimpleInfluence * passing lint * passing format * making small progress in coding * Delete fast_influence.py Submit to the wrong branch * Delete faiss_utils.py wrong branch * Delete gpt2_bug.py not sure why it's included * Delete text_class.py not sure why it's included * adding test file * adding testing files * deleted unwanted files * deleted unwanted files and rearrange test files * small bug * adjust function call to save instance in json * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * move some documentation of parameters to base class * delete one comment * delete one deprecated abstract method * changing interface * formatting * formatting err * passing mypy * passing mypy * passing mypy * passing mypy * passing integration test * passing integration test * adding a new option to the do-all function * modifying the callable function to the interface * update API, fixes * doc fixes * add `from_path` and `from_archive` methods * fix docs, improve logging * add test * address @matt-gardner's comments * fixes to documentation * update docs Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> * Update CONTRIBUTING.md (#5133) * Update CONTRIBUTING.md * updated changelog Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Arjun Subramonian <arjuns@ip-192-168-0-106.us-west-2.compute.internal> * fix #5132 (#5134) * fix * Prepare for release v2.3.1 * Fairness Metrics (#5093) * Added three definitions of fairness * Updated CHANGELOG * Added DemographicParityWithoutGroundTruth and finished tests * finished refactoring Independence, Separation, and Sufficiency to accumulate * added distributed functionality to Independence, Sufficiency, and Separation * Finished aggregate and distributed functionality for DemographicParityWithoutGroundTruth * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU issues * fixed GPU issues * added init file * fixed typo * minor docstring changes * minor changes to docstring * Added simple explanations of fairness metrics to docstrings * Further vectorized all metric implementations * Fixed device issue Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * fix cached_path for hub downloads (#5141) * fix cached_path for hub downloads * fix test name * fix type hint * Update allennlp/common/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix * fix Co-authored-by: epwalsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> Co-authored-by: Jacob Morrison <jacob1morrison@gmail.com> Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Leo Liu <zeyuliu2@uw.edu> Co-authored-by: ArjunSubramonian <arjun.subramonian@gmail.com> Co-authored-by: Arjun Subramonian <arjuns@ip-192-168-0-106.us-west-2.compute.internal> Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Added three definitions of fairness * Updated CHANGELOG * Added DemographicParityWithoutGroundTruth and finished tests * finished refactoring Independence, Separation, and Sufficiency to accumulate * added distributed functionality to Independence, Sufficiency, and Separation * Finished aggregate and distributed functionality for DemographicParityWithoutGroundTruth * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU issues * fixed GPU issues * added init file * fixed typo * minor docstring changes * minor changes to docstring * Added simple explanations of fairness metrics to docstrings * Further vectorized all metric implementations * Fixed device issue Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>

@matt-gardner

* Formatting * New activation functions * Makes position embeddings optional in the transformer embeddings * Adds T5 * Various fixes to make this start up * Share weights * Adds one test that passes, and one test that fails * use min_value_of_dtype in apply_mask * fixes, add beam search * encoder fixes * fix * fix beam search * fix tests * rename to just 'T5' * fix initialization from pretrained * add Model, DatasetReader, and Predictor * remove useless dataset reader * move high-level peices to allennlp-models * revert predictor changes * remove unneeded hidden_size * remove stray comment * bool masks * CHANGELOG * fix test file name * revert other change * revert other change * Distributed training with gradient accumulation (#5100) * Fixes distributed training with gradient accumulation * Fix in case we don't do anything in a batch group * Test for the problematic condition * Formatting * More formatting * Changelog * Fix another test * Fix even more tests * Fixes one more test * I can fix these tests all day. * Add link to gallery and demo in README (#5103) * Add link to gallery in README * Update README.md * try emojis Is this overkill? * Adding a metadata field to the basic classifier (#5104) * Adding metadata parameter to BasicClassifier * Fix * Updating the changelog * reformatting * updating parameter type * fixing import Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * additional W&B params (#5114) * additional W&B params * add wandb_kwargs * fix * fix docs * Add eval_mode argument to pretrained transformer embedder (#5111) * Add eval_mode argument to pretrained transformer embedder * Edit changelog entry * Lint * Update allennlp/modules/token_embedders/pretrained_transformer_embedder.py * Apply suggestions from code review Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> * specify 'truncation' to avoid transformers warning (#5120) * specify 'truncation' to avoid transformers warning * Update docs * Remove `stride` param * Update CHANGELOG.md Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * Predicting with a dataset reader on a multitask model (#5115) * Create a way to use allennlp predict with a dataset and a multitask model * Fix type ignoration * Changelog * Fix to the predictor * fix bug with interleaving dataset reader (#5122) * fix bug with interleaving dataset reader * more tests * Update allennlp/data/dataset_readers/interleaving_dataset_reader.py * Update allennlp/data/dataset_readers/interleaving_dataset_reader.py * remove jsonpickle from dependencies (#5121) Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * Update docstring for basic_classifier (#5124) * improve error message from Registrable class (#5125) Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> * Prepare for release v2.3.0 * fix docs CI * Take the number of runs in the test for distributed metrics (#5127) * Take the number of runs in the test for distributed metrics * Changelog * Add influence functions to interpret module (#4988) * creating a new functionality to fields and instances to support outputing instnaces to json files * creating tests for the new functionality * fixing docs * Delete __init__.py * Delete influence_interpreter.py * Delete use_if.py * Delete simple_influence_test.py * fixing docs * finishing up SimpleInfluence * passing lint * passing format * making small progress in coding * Delete fast_influence.py Submit to the wrong branch * Delete faiss_utils.py wrong branch * Delete gpt2_bug.py not sure why it's included * Delete text_class.py not sure why it's included * adding test file * adding testing files * deleted unwanted files * deleted unwanted files and rearrange test files * small bug * adjust function call to save instance in json * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * Update allennlp/interpret/influence_interpreters/influence_interpreter.py Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> * move some documentation of parameters to base class * delete one comment * delete one deprecated abstract method * changing interface * formatting * formatting err * passing mypy * passing mypy * passing mypy * passing mypy * passing integration test * passing integration test * adding a new option to the do-all function * modifying the callable function to the interface * update API, fixes * doc fixes * add `from_path` and `from_archive` methods * fix docs, improve logging * add test * address @matt-gardner's comments * fixes to documentation * update docs Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> * Update CONTRIBUTING.md (#5133) * Update CONTRIBUTING.md * updated changelog Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Arjun Subramonian <arjuns@ip-192-168-0-106.us-west-2.compute.internal> * fix #5132 (#5134) * fix * Prepare for release v2.3.1 * Fairness Metrics (#5093) * Added three definitions of fairness * Updated CHANGELOG * Added DemographicParityWithoutGroundTruth and finished tests * finished refactoring Independence, Separation, and Sufficiency to accumulate * added distributed functionality to Independence, Sufficiency, and Separation * Finished aggregate and distributed functionality for DemographicParityWithoutGroundTruth * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU and doc issues * fixed GPU issues * fixed GPU issues * added init file * fixed typo * minor docstring changes * minor changes to docstring * Added simple explanations of fairness metrics to docstrings * Further vectorized all metric implementations * Fixed device issue Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * fix cached_path for hub downloads (#5141) * fix cached_path for hub downloads * fix test name * fix type hint * Update allennlp/common/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix * fix Co-authored-by: epwalsh <epwalsh10@gmail.com> Co-authored-by: Evan Pete Walsh <petew@allenai.org> Co-authored-by: Jacob Morrison <jacob1morrison@gmail.com> Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Leo Liu <zeyuliu2@uw.edu> Co-authored-by: ArjunSubramonian <arjun.subramonian@gmail.com> Co-authored-by: Arjun Subramonian <arjuns@ip-192-168-0-106.us-west-2.compute.internal> Co-authored-by: Arjun Subramonian <arjuns@Arjuns-MacBook-Pro.local> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Arjun Subramonian added 2 commits April 2, 2021 16:50

Added three definitions of fairness

e552116

Updated CHANGELOG

99a2140

ArjunSubramonian requested review from dirkgr, schmmd, AkshitaB and epwalsh April 2, 2021 23:54

ArjunSubramonian self-assigned this Apr 2, 2021

Arjun Subramonian and others added 2 commits April 8, 2021 09:01

Added DemographicParityWithoutGroundTruth and finished tests

37373c8

Merge branch 'main' into arjuns/fairness-metrics

e9275d4

AkshitaB suggested changes Apr 8, 2021

View reviewed changes

Arjun Subramonian and others added 13 commits April 11, 2021 01:47

finished refactoring Independence, Separation, and Sufficiency to acc…

c62d45f

…umulate

added distributed functionality to Independence, Sufficiency, and Sep…

6f857c4

…aration

Finished aggregate and distributed functionality for DemographicParit…

55dc79e

…yWithoutGroundTruth

fixed GPU and doc issues

22825a6

fixed GPU and doc issues

728068c

fixed GPU and doc issues

bc74e6d

fixed GPU issues

261889b

fixed GPU issues

6e9422d

added init file

8e921f4

Merge branch 'main' into arjuns/fairness-metrics

29416ae

fixed typo

35aa2eb

Merge branch 'arjuns/fairness-metrics' of https://github.com/allenai/…

7fa5945

…allennlp into arjuns/fairness-metrics

minor docstring changes

27d78e9

epwalsh reviewed Apr 14, 2021

View reviewed changes

epwalsh reviewed Apr 15, 2021

View reviewed changes

minor changes to docstring

80b3dc5

ArjunSubramonian requested review from AkshitaB and epwalsh April 15, 2021 04:07

Added simple explanations of fairness metrics to docstrings

6eb212a

AkshitaB reviewed Apr 16, 2021

View reviewed changes

Further vectorized all metric implementations

f015422

ArjunSubramonian requested a review from AkshitaB April 16, 2021 03:39

Arjun Subramonian and others added 3 commits April 15, 2021 20:46

Fixed device issue

63b5803

Merge branch 'main' into arjuns/fairness-metrics

102a9dc

Merge branch 'main' into arjuns/fairness-metrics

c7c5a59

AkshitaB approved these changes Apr 20, 2021

View reviewed changes

Merge branch 'main' into arjuns/fairness-metrics

9e32b39

AkshitaB merged commit f877fdc into main Apr 20, 2021

AkshitaB deleted the arjuns/fairness-metrics branch April 20, 2021 22:34

seraphinatarrant mentioned this pull request Nov 5, 2021

Fix for Issue with unbounded KL divergence in Fairness Metrics #5458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fairness Metrics #5093

Fairness Metrics #5093

ArjunSubramonian commented Apr 2, 2021 •

edited

Loading

AkshitaB left a comment

AkshitaB Apr 8, 2021

AkshitaB Apr 8, 2021

AkshitaB Apr 8, 2021

ArjunSubramonian Apr 8, 2021

AkshitaB Apr 8, 2021

AkshitaB Apr 8, 2021

AkshitaB Apr 8, 2021

epwalsh Apr 14, 2021

ArjunSubramonian Apr 15, 2021

epwalsh Apr 15, 2021

ArjunSubramonian Apr 15, 2021 •

edited

Loading

AkshitaB Apr 16, 2021 •

edited

Loading

ArjunSubramonian Apr 16, 2021

-            Note: all tensors are expected to be on the same device.
+        !!! Note
+            All tensors are expected to be on the same device.

		It is provably impossible to satisfy any two of Independence, Separation, and Sufficiency simultaneously,
		except in degenerate cases.

Fairness Metrics #5093

Fairness Metrics #5093

Conversation

ArjunSubramonian commented Apr 2, 2021 • edited Loading

AkshitaB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArjunSubramonian Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

AkshitaB Apr 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArjunSubramonian commented Apr 2, 2021 •

edited

Loading

ArjunSubramonian Apr 15, 2021 •

edited

Loading

AkshitaB Apr 16, 2021 •

edited

Loading