ENH permutation importance #142

merveenoyan · 2022-09-16T17:40:18Z

I implemented permutation importance, will write tests if you like where it goes.

I have two problems:

Are we okay with matplotlib being a dependency here? I'll look for something better. Same applies for pandas.
I'm looking for a more standard way of getting feature names. For some base estimators or pipelines it varies, from feature_names_in_ to get_feature_names_out().

The example script looks like below (I'll change layout of plot to fit inside the plot itself):

BenjaminBossan

I have some specific comments for this PR and then a more general discussion.

For this PR, I would like to see some tests added. Also, I have a few comments for some lines of code, please check.

For the more general discussion, I want to ask the following question: Do we want to design skops model cards such that we add a bunch of narrow methods such as add_feature_importances or do we just want to have general methods like add_plot and require the user to do the work inside the method themselves?

E.g. for adding cross-validation results, we have put the burden on the user to process the data and then use the generic add_table method:

skops/examples/plot_model_card.py

Lines 151 to 166 in cfea7cb

    
           cv_results = model.cv_results_ 
        
           clf_report = classification_report( 
        
               y_test, y_pred, output_dict=True, target_names=["malignant", "benign"] 
        
           ) 
        
           # The classification report has to be transformed into a DataFrame first to have 
        
           # the correct format. This requires removing the "accuracy", which was added 
        
           # above anyway. 
        
           del clf_report["accuracy"] 
        
           clf_report = pd.DataFrame(clf_report).T.reset_index() 
        
           model_card.add_table( 
        
               folded=True, 
        
               **{ 
        
                   "Hyperparameter search results": cv_results, 
        
                   "Classification report": clf_report, 
        
               }, 
        
           )

For consistency, this should either be its own method,add_cv_results, or add_feature_importances should not be its own method.

The arguments for not providing specific methods are:

Less work for us (though we should still provide examples)
More flexibility for the user. E.g. here, the user cannot change the parameters of the plot, e.g. the what the title should be or what the name of the plot should be.
Don't need to worry about pandas dependency etc.

The argument for providing the specific methods:

Easier for the user if they don't want to modify the defaults.
Easier to discover than just providing examples.

skops/card/_model_card.py

BenjaminBossan · 2022-09-19T10:32:01Z

skops/card/_model_card.py

@@ -10,6 +10,8 @@
 from reprlib import Repr
 from typing import Any, Optional, Union

+import matplotlib.pyplot as plt
+import pandas as pd


Here it is assumed that pandas is installed. However, pandas is not a strict dependency for skops. I think we specifically didn't want to add pandas because it is a "fat" dependency.

Yes I thought of it as well, I will try to plot without it.

merveenoyan · 2022-09-19T15:04:58Z

@BenjaminBossan I had a quick huddle last week with @adrinjalali where we discussed this and he told me to create a method to add feature importance graphs. I can remove this and add it to model card example instead.

BenjaminBossan · 2022-09-19T15:20:59Z

he told me to create a method to add feature importance graphs. I can remove this and add it to model card example instead.

If both of you agree that this is a good thing to have, I'm also fine with it. I just wanted to make sure that this decision is made deliberately, after taking into account the tradeoffs. Especially, it will probably result in quite a few very similar methods being added in the future (like the example of adding CV results).

merveenoyan · 2022-09-19T15:26:58Z

@BenjaminBossan I sort of agree and think the dependencies are a bit much if we want to move forward (I'll handle stuff with numpy instead). Would you like to wait for @adrinjalali to come back from his vacay to make a decision?

BenjaminBossan · 2022-09-19T15:32:55Z

Would you like to wait for @adrinjalali to come back from his vacay to make a decision?

If you can manage to remove the pandas dependency and if you already decided with Adrin that this would be a good feature to have, I think we can move forward.

adrinjalali

Thanks @merveenoyan

examples/plot_model_card.py

skops/card/_model_card.py

adrinjalali · 2022-09-26T10:39:43Z

skops/card/_model_card.py

@@ -366,6 +368,33 @@ def add_metrics(self, **kwargs: str) -> "Card":
            self._eval_results[metric] = value
        return self

+    def add_feature_importances(self, feature_importances) -> "Card":
+        """Visualize permutation importance.


I think in terms of API, it would make sense to accept importances in a certain way, like accepting both a dict and a pandas dataframe, and then add it to the modelcard.

It doesn't have to be permutation importance.

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

examples/plot_model_card.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan

I have a couple of comments, please take a look.

Also, I have a more general comment:

What do we do if a user wants to add more than 1 permutation importance graph? E.g. one for accuracy and one for recall? Right now, this would not work because the file name feature_importances.png would be re-used, so we would get the same graph twice. Also, the section name is hard-coded, so the two sections would have the exact same name. I think we should have a solution for this.

skops/card/_model_card.py

merveenoyan · 2022-11-07T16:46:46Z

Fixes #104

BenjaminBossan

Good work, by adding the new arguments, my concerns could be addressed. There are still a few improvements, please see my comments. Also, I think it would be a good idea to update the plot_model_card.py and perhaps also add a sentence to docs/model_card.rst.

skops/card/_model_card.py

skops/card/tests/test_card.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

merveenoyan · 2023-01-23T11:20:29Z

@BenjaminBossan codecov fails 😅🥲

BenjaminBossan · 2023-01-23T13:50:39Z

@merveenoyan Re-running solved the codecov issue. Not sure why codecov hates you specifically :D

Could you please merge the current main branch, as have now added CI for Python 3.11? Then I'll review again.

merveenoyan · 2023-01-24T18:20:25Z

@BenjaminBossan merged

BenjaminBossan

Sorry, again I have a few small comments. But I swear we're getting really close :)

BenjaminBossan · 2023-01-25T10:23:44Z

skops/conftest.py

@@ -16,3 +18,30 @@ def mock_import(name, *args, **kwargs):

    with patch("builtins.__import__", side_effect=mock_import):
        yield
+
+    import matplotlib  # noqa


We don't need this line. First of all, it's pandas here, not matplotlib, second, we don't delete pandas from the import cache, so re-importing is not necessary.

BenjaminBossan · 2023-01-25T10:24:13Z

skops/io/_audit.py

@@ -2,7 +2,7 @@

 import io
 from contextlib import contextmanager
-from typing import Any, Generator, Literal, Sequence, Type, Union
+from typing import Any, Generator, Literal, Sequence, Type, Union  # type: ignore


Is the # type: ignore here really necessary?

BenjaminBossan · 2023-01-25T10:24:40Z

skops/utils/importutils.py

+
+
+def import_or_raise(module, feature_name):
+    """Raise error


Let's add a better description here.

BenjaminBossan · 2023-01-25T10:25:46Z

skops/utils/importutils.py

+        module = import_module(module)
+    except ImportError as e:
+        package = module.split(".")[0]
+        raise ModuleNotFoundError(


For my understanding: Why not a simple ImportError?

BenjaminBossan · 2023-01-26T10:55:21Z

@merveenoyan Some tests are failing. This is mostly fixed after merging with current main branch. Could you please do that?

merveenoyan · 2023-01-29T22:20:19Z

@BenjaminBossan should work now (magically ✨) but I realized pytest doesn't collect tests of parser for some reason on mac (assuming it's same on windows) (that's why nothing failed on my local, I guess)

I checked github workflow and looked for needs in case the workflow doesn't run it unless ubuntu tests pass (since I know ubuntu tests are usually fast so it's good to add such prerequisite, but here it's not the case). Then I decided to run the tests in isolation on my local both for whole file and specifically on the test that is failing, yet it refuses to collect on my local too! 🤯

(py39) ➜  card git:(feature_importance) python3 -m pytest -sv tests/test_parser.py
================================================= test session starts =================================================
platform darwin -- Python 3.9.16, pytest-7.2.1, pluggy-1.0.0 -- /opt/anaconda3/envs/py39/bin/python3
cachedir: .pytest_cache
rootdir: /Users/mervenoyan/Desktop/skops/skops, configfile: pyproject.toml
plugins: flaky-3.7.0, cov-4.0.0, anyio-3.6.2
collected 0 items / 2 skipped

do you know what's going on?

BenjaminBossan · 2023-01-30T10:24:21Z

I realized pytest doesn't collect tests of parser for some reason on mac (assuming it's same on windows) (that's why nothing failed on my local, I guess)

Probably it's just because you haven't installed pandoc locally? This line makes it so that tests are skipped when pandoc is not found:

skops/skops/card/tests/test_parser.py

Lines 14 to 18 in d9b7c36

    
           try: 
        
               check_pandoc_installed() 
        
           except FileNotFoundError: 
        
               # not installed, skip 
        
               pytest.skip(reason="These tests require a recent pandoc", allow_module_level=True)

The reason why it runs on Ubuntu CI is:

skops/.github/workflows/build-test.yml

Lines 65 to 67 in d9b7c36

    
                   if [ ${{ matrix.os }} == "ubuntu-latest" ]; 
        
                     then sudo apt install pandoc && pandoc --version; 
        
                   fi

I haven't added pandoc to the other builds because it's less trivial to install it on Mac and Windows and, as you mentioned, Ubuntu is the fastest.

BenjaminBossan

There are still a few # type: ignores in here which I think can be removed, apart from that it's ready to be merged from my perspective.

merveenoyan · 2023-01-30T12:31:10Z

@BenjaminBossan I re-ran for codecov but no chance, it's failing 🥲🥲🥲🥲

BenjaminBossan

I think now we're good 🎉

permutation importance

18ef31b

merveenoyan requested review from adrinjalali and BenjaminBossan September 16, 2022 17:40

BenjaminBossan requested changes Sep 19, 2022

View reviewed changes

adrinjalali reviewed Sep 26, 2022

View reviewed changes

Update examples/plot_model_card.py

2ff7a5c

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

BenjaminBossan reviewed Oct 4, 2022

View reviewed changes

examples/plot_model_card.py Outdated Show resolved Hide resolved

merveenoyan and others added 3 commits November 7, 2022 12:03

Update examples/plot_model_card.py

5b8d4c9

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Merge branch 'main' into feature_importance

ae99b89

added test and got rid of pandas

e80359d

merveenoyan requested review from BenjaminBossan and adrinjalali November 7, 2022 13:56

change import

4ef3549

BenjaminBossan reviewed Nov 7, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

merveenoyan added 4 commits November 7, 2022 16:15

fixes

133cc2e

fixes

1c448bc

updated docs & more

95ad03b

docs

2bc714f

merveenoyan requested a review from BenjaminBossan November 7, 2022 15:46

BenjaminBossan requested changes Nov 8, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/tests/test_card.py Outdated Show resolved Hide resolved

merveenoyan added 5 commits November 8, 2022 12:14

added another test, updated docs, will add to model card rst

b457fb9

removed unnecessary files

7228e83

added importance to model card guide

9471c76

moved filepaths to tempfile

48c656d

moved filepaths to tempfile

a514060

merveenoyan and others added 3 commits January 23, 2023 12:14

removed test, nits and more

6e3fd2b

Update skops/card/_model_card.py

9d98003

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update skops/card/_model_card.py

17e0253

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

merveenoyan requested a review from BenjaminBossan January 23, 2023 11:20

Merge branch 'main' into feature_importance

2b4df99

BenjaminBossan requested changes Jan 25, 2023

View reviewed changes

iterated

cf18588

merveenoyan requested a review from BenjaminBossan January 25, 2023 17:19

merveenoyan added 6 commits January 29, 2023 22:08

added print to debug on ubuntu

7a02cd4

more debugging

602b8d2

more debugging

c23f63f

Merge branch 'skops-dev:main' into feature_importance

5407351

removed debug

e046f88

removed debugging line from github workflow

a16d83f

BenjaminBossan reviewed Jan 30, 2023

View reviewed changes

merveenoyan added 5 commits January 30, 2023 13:02

removed mypy ignores

fce9b14

Merge branch 'main' into feature_importance

1281274

removed mypy ignores

01a5d78

removed mypy ignores

77f9f8c

merge local

7c70656

merveenoyan requested a review from BenjaminBossan January 30, 2023 12:40

BenjaminBossan approved these changes Jan 30, 2023

View reviewed changes

BenjaminBossan merged commit 81558aa into skops-dev:main Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH permutation importance #142

ENH permutation importance #142

merveenoyan commented Sep 16, 2022 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

BenjaminBossan Sep 19, 2022

merveenoyan Sep 19, 2022

merveenoyan commented Sep 19, 2022

BenjaminBossan commented Sep 19, 2022

merveenoyan commented Sep 19, 2022 •

edited

Loading

BenjaminBossan commented Sep 19, 2022

adrinjalali left a comment

adrinjalali Sep 26, 2022

BenjaminBossan left a comment

merveenoyan commented Nov 7, 2022

BenjaminBossan left a comment

merveenoyan commented Jan 23, 2023

BenjaminBossan commented Jan 23, 2023

merveenoyan commented Jan 24, 2023

BenjaminBossan left a comment

BenjaminBossan Jan 25, 2023

BenjaminBossan Jan 25, 2023

BenjaminBossan Jan 25, 2023

BenjaminBossan Jan 25, 2023

BenjaminBossan commented Jan 26, 2023

merveenoyan commented Jan 29, 2023

BenjaminBossan commented Jan 30, 2023

BenjaminBossan left a comment

merveenoyan commented Jan 30, 2023

BenjaminBossan left a comment

	cv_results = model.cv_results_
	clf_report = classification_report(
	y_test, y_pred, output_dict=True, target_names=["malignant", "benign"]
	)
	# The classification report has to be transformed into a DataFrame first to have
	# the correct format. This requires removing the "accuracy", which was added
	# above anyway.
	del clf_report["accuracy"]
	clf_report = pd.DataFrame(clf_report).T.reset_index()
	model_card.add_table(
	folded=True,
	**{
	"Hyperparameter search results": cv_results,
	"Classification report": clf_report,
	},
	)

ENH permutation importance #142

ENH permutation importance #142

Conversation

merveenoyan commented Sep 16, 2022 • edited Loading

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

BenjaminBossan Sep 19, 2022

Choose a reason for hiding this comment

merveenoyan Sep 19, 2022

Choose a reason for hiding this comment

merveenoyan commented Sep 19, 2022

BenjaminBossan commented Sep 19, 2022

merveenoyan commented Sep 19, 2022 • edited Loading

BenjaminBossan commented Sep 19, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali Sep 26, 2022

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

merveenoyan commented Nov 7, 2022

BenjaminBossan left a comment

Choose a reason for hiding this comment

merveenoyan commented Jan 23, 2023

BenjaminBossan commented Jan 23, 2023

merveenoyan commented Jan 24, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Jan 25, 2023

Choose a reason for hiding this comment

BenjaminBossan Jan 25, 2023

Choose a reason for hiding this comment

BenjaminBossan Jan 25, 2023

Choose a reason for hiding this comment

BenjaminBossan Jan 25, 2023

Choose a reason for hiding this comment

BenjaminBossan commented Jan 26, 2023

merveenoyan commented Jan 29, 2023

BenjaminBossan commented Jan 30, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

merveenoyan commented Jan 30, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

merveenoyan commented Sep 16, 2022 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

merveenoyan commented Sep 19, 2022 •

edited

Loading