Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add metric frame #298

Merged
merged 21 commits into from
Mar 7, 2023
Merged

Conversation

lazarust
Copy link
Contributor

Closes #278

@lazarust lazarust changed the title Enh add metric frame ENH add metric frame Feb 12, 2023
@lazarust lazarust marked this pull request as ready for review February 12, 2023 21:22
@lazarust
Copy link
Contributor Author

@BenjaminBossan I feel I should write a test for this, but I'm unsure where it should go. I was thinking of skops/card/tests/test_card.py inside the TestTableSection class. Thoughts? Is there somewhere else it would be better?

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a thorough review yet, just high level.

I feel I should write a test for this, but I'm unsure where it should go. I was thinking of skops/card/tests/test_card.py inside the TestTableSection class. Thoughts? Is there somewhere else it would be better?

I think it would be best to add a completely new TestAddMetricFrame. You can take inspiration from other tests though.

@@ -11,6 +11,7 @@
from typing import Any, Iterator, Literal, Protocol, Sequence, Union

import joblib
from fairlearn.metrics import MetricFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not import fairlearn at the top level, since it would mean that it's a required dependency. It should be imported inside the corresponding method, similar to how you import pandas there.

However, you should not import them directly, but use the helper function we introduced, like here:

plt = import_or_raise("matplotlib.pyplot", "permutation importance")

self, metrics: dict, y_true, y_pred, sensitive_features, pivot=True
) -> Card:
"""
Add a metric frame table to the model card.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the name add_metric_frame, I'm just wondering out loud if we can find a better name. For users who don't know fairlearn, the name could be very confusing, and the docstring "Add a metric frame table to the model card" doesn't add much.

@BenjaminBossan
Copy link
Collaborator

@lazarust I see that there are some black issues. If you set up your dev environment with pre-commit hooks as described here, they should be caught before committing.

@lazarust
Copy link
Contributor Author

@BenjaminBossan Thanks for the tip! Sorry, I missed that from the Contribution Guide. I'm currently working on making the tests. I'll ping you whenever it's ready again!

@lazarust
Copy link
Contributor Author

@BenjaminBossan This is ready for you to take a look at!

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this. I have a couple of code comments, but before considering those, I would like to discuss a bigger design decision.

Right now, the user more or less passes the arguments for MetricFrame to the add_fairlearn_metric_frame method, which takes care of creating the MetricFrame instance. I would propose to change this so that the user has to create the MetricFrame instance themselves, then pass it to add_fairlearn_metric_frame ("inversion of control"), which does not need to construct it but just takes care of creating and adding the table.

This approach is similar to add_permutation_importances, which takes the computed permutation importances as input, instead of computing them inside the method.

Why do I think this could be better? Here are a few reasons:

With the current implementation, the user loses control over the instantiation of MetricFrame. Therefore, if they want to use something like control_features or sample_params, it's not possible. Of course, we could add more parameters to the signature of add_fairlearn_metric_frame, but it only gets bigger and bigger that way, and we have to keep it up to date when fairlearn changes.

Another advantage of having the user pass the instance is that we don't need to import fairlearn inside the method. If a user creates that object, they have already imported fairlearn.

One more advantage is that if a user has a custom MetricFrame class, they can pass that to the method, whereas right now, it's impossible to use. For testing, we could even create a mock object instead of using a real MetricFrame, and then skops would have no dependency on fairlearn at all! But I think adding a test dependency is fine here.

A disadvantage of my proposal is that users have to do a little bit of extra work by instantiating the object themselves.

Overall, I think this price is worth paying. What do you think? If we decide to make this change and you refactor the code accordingly, then many of my comments are obsolete. I think you will see which ones.

y_pred,
sensitive_features,
table_name: str,
pivot=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the pivot option? Also, is it not "transpose" more than "pivot"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I guess we don't really need the pivot if we don't want to.

So if we don't transpose it the table looks like:

difference group_max group_min ratio
selection_rate 0.4 0.8 0.4 0.5

But when transposing it looks like:

selection_rate
difference 0.4
group_max 0.8
group_min 0.4
ratio 0.5

Personally, I see the transposed version as more useful so we could just make that one the version of the table that is generated. What do you think?

I'm fine changing the name of the parameter to transpose if we decide to keep it.

skops/card/_model_card.py Outdated Show resolved Hide resolved
Comment on lines 1321 to 1322
"""
Add a Fairlearn MetricFrame table to the model card.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Add a Fairlearn MetricFrame table to the model card.
"""Add a Fairlearn MetricFrame table to the model card.

Could you also add a few words of description here, + a link to to fairlearn? If you want to link to the class inside fairlearn docs, you would need to add an entry to the intersphinx mappings:

intersphinx_mapping = {


Parameters
----------
metrics: dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the fairlearn docs, metrics can also be a callable. I think the outputs of metric_frame.difference() etc. will be scalars instead of lists, which is a bit annoying. So when the table is created, we might need to add np.atleast_1d(metric_frame.difference()) or something like this to support callables.

skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
@lazarust
Copy link
Contributor Author

@BenjaminBossan Thanks for the review! I haven't looked at your code comments yet since the decision of whether to pass in the parameters or a MetricFrame affects a lot of it.

Of course, we could add more parameters to the signature of add_fairlearn_metric_frame, but it only gets bigger and bigger that way, and we have to keep it up to date when fairlearn changes.

I agree! I was thinking of that while writing the method but was unsure if that was something we wanted to do.

I do wonder if we require users to install and import fairlearn on their own if this feature just won't be known/used as much. I could be completely off base on that, but it is a thought I have. We could add an example of using this or just add it to the existing examples.

Beyond that, I don't really see too many disadvantages that would outway the advantages. What're your thoughts?

@BenjaminBossan
Copy link
Collaborator

I do wonder if we require users to install and import fairlearn on their own if this feature just won't be known/used as much.

We would do that anyway, fairlearn would not be installed by default when installing skops. And even if it were, users don't just discover and try out random packages that were installed :)

We could add an example of using this or just add it to the existing examples.

This + documenting it well are definitely the way to go for users to discover and use the feature.

Beyond that, I don't really see too many disadvantages that would outway the advantages. What're your thoughts?

Great, I think we agree then.

@lazarust
Copy link
Contributor Author

@BenjaminBossan Thanks for explaining this! I think I've addressed/responded to most of your comments.

I still need to handle scalars from the MetricFrame and update the doc string.

@BenjaminBossan
Copy link
Collaborator

Great, thanks, please ping Adrin and me once it's ready for review.

@lazarust
Copy link
Contributor Author

lazarust commented Mar 1, 2023

@BenjaminBossan and @adrinjalali This is ready to review again!!

@lazarust lazarust requested a review from BenjaminBossan March 1, 2023 01:23
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me, and we should probably incorporate it in our existing examples, but that could be a separate PR.

docs/changes.rst Outdated Show resolved Hide resolved
skops/_min_dependencies.py Outdated Show resolved Hide resolved
skops/card/tests/test_card.py Show resolved Hide resolved
Comment on lines 1158 to 1163
def add_fairlearn_metric_frame(
self,
metric_frame,
table_name: str = "Fairlearn MetricFrame Table",
transpose=True,
) -> Card:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be adding a description to our add_* methods? From the user's perspective, it seems odd that it's tricky to add a description here for sections. I think it'd make sense for all of them to allow a description as well as a title. WDYT? also cc @skops-dev/maintainers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it would be good to have, some existing methods also don't have it, e.g. add_table. It's probably easier to have a separate PR where this argument is added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali For clarification, should I add the description argument to this method in this PR? Or do it in another one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can keep it for another PR as @BenjaminBossan suggested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: Could you please change the type annotation to return -> Self? This is a change we recently made on all the other methods as well.

-------
self: Card
The model card with the metric frame added.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Notes
--------
You can check `fairlearn's documentation
<https://fairlearn.org/v0.8/user_guide/assessment/index.html>`__ on how to
work with `MetricFrame`s.
"""

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to add from my side. I think the transpose feature could be implemented without pandas, but it's not a big deal, most users will probably have pandas installed anyway.

@adrinjalali
Copy link
Member

I think the transpose feature could be implemented without pandas, but it's not a big deal, most users will probably have pandas installed anyway.

fairlearn depends on pandas anyway.

@BenjaminBossan
Copy link
Collaborator

fairlearn depends on pandas anyway.

I didn't even think about that, then forget what I said :)

@lazarust
Copy link
Contributor Author

lazarust commented Mar 2, 2023

@adrinjalali and @BenjaminBossan Thanks for reviewing this! I've addressed all the comments except for one I needed clarification on.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, just this minor comment, then we can merge.

Comment on lines 1158 to 1163
def add_fairlearn_metric_frame(
self,
metric_frame,
table_name: str = "Fairlearn MetricFrame Table",
transpose=True,
) -> Card:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: Could you please change the type annotation to return -> Self? This is a change we recently made on all the other methods as well.

@lazarust lazarust requested a review from BenjaminBossan March 7, 2023 01:04
@lazarust
Copy link
Contributor Author

lazarust commented Mar 7, 2023

@BenjaminBossan I've addressed your comment! Sorry, it took me a couple of days to get to this.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks a lot for your contributions, really appreciated.

@BenjaminBossan BenjaminBossan merged commit 5b3a7a4 into skops-dev:main Mar 7, 2023
BenjaminBossan added a commit to BenjaminBossan/skops that referenced this pull request Mar 7, 2023
We merged skops-dev#298 and skops-dev#310 shortly after each other but they contained an
incompatibility that broke the fairlearn tests (the code itself was
fine). This PR fixes this incompatibility.

On top, I added the description argument to add_fairlearn_metric_frame,
to be consistent with all the other methods, and also as a test for it.

Finally, 2 small fixes:

- Added type annotation to transpose argument
- Changed order of arguments in docstring to match order in signature
adrinjalali pushed a commit that referenced this pull request Mar 7, 2023
We merged #298 and #310 shortly after each other but they contained an incompatibility that broke the fairlearn tests (the code itself was fine). This PR fixes this incompatibility.

To be clear, the only change needed to fix the tests is the following:

```python
- actual_table = card.select("Metric Frame Table").content.format()
+ actual_table = card.select("Metric Frame Table").format()
```

On top, I added the `description` argument to `add_fairlearn_metric_frame`, to be consistent with all the other methods (also changed in #310), and I also added as a test for it. Since we now have 2 tests, I moved the `metric_frame` variable to a fixture.

Finally, 2 small fixes:

- Added type annotation to transpose argument
- Changed order of arguments in docstring to match order in signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helper to add fairlearn's MetricFrame to the model card
3 participants