Alternative Card implementation #203

BenjaminBossan · 2022-10-25T15:09:17Z

This is an RFC for a suggested solution for the issues discussed on #72

Description

The proposed model card implementation would allow to dynamically add sections or overwrite them.

This is not a complete implementation but already covers most of the features we already have and then some.

On top of these features, it would be possible to add more features like creating a default Card with placeholders, just like the existing template, or the possibility to delete existing sections or to retrieve the result of a certain section.

Implementation

The underlying data structure consists of a dict and a Section dataclass.

All data is stored in a _data attribute with the type dict[str, Section]. The dataclass hold the section contents, i.e. the section title, the section content, and subsections, which again have the same type. It's thus recursive data structure (basically a tree). Section title and dict key are identical, which is mostly for convenience.

With this refactor, there are no separate data containers anymore for eval results, template sections, extra sections, etc. They are all treated the same.

IMHO, this greatly simplifies the code overall. The only complex function that's left is the one needed to traverse the tree holding the data, and even that is just 14 LOC.

Demo

updated

To see how the new class can be used, take a look at the main function. The resulting Card can be seen here:

https://huggingface.co/skops-ci/hf_hub_example-e02eee42-5c99-4924-879e-95a20d795d49

The README file looks like this:

https://huggingface.co/skops-ci/hf_hub_example-e02eee42-5c99-4924-879e-95a20d795d49/raw/main/README.md

This is a suggestion to address the issues discussed on skops-dev#72 Description The proposed model card implementation would allow to dynamically add sections or overwrite them. This is not a complete implementation but already covers most of the features we already have and then some. On top of these features, it would be possible to add more features like creating a default Card with placeholders, just like the exisint template, or the possibility to delete existing sections or to retrieve the result of a certain section. Implementation The underlying data structure consists of a dict and a Section dataclass. All data is stored in a _data attribute with the type dict[str, Section]. The dataclass hold the section contents, i.e. the section title, the section content, and subsections, which again have the same type. It's thus recursive data structure. Section title and dict key are identical, which is mostly for convenience. With this refactor, there are no separate data containers anymore for eval results, template sections, extra sections, etc. They are all treated the same. IMHO, this greatly simplifies the code overall. The only complex function that's left is the one needed to traverse the tree holding the data, and even that is just 14 LOC. Demo To see how the new class can be used, take a look at the main function. The resulting Card can be seen here: https://huggingface.co/skops-ci/hf_hub_example-fcc0d6fe-d072-4f94-8fdb-6bf3bb917bca

BenjaminBossan · 2022-10-25T15:12:17Z

RFC @skops-dev/maintainers @E-Aho

Don't hesitate to tell me if you think this totally sucks ;)

E-Aho

Overall I think this is great! In my opinion it looks like it would be a lot more flexible than using .md templates.

I had a couple things I thought might be nice additions, but overall I really like this implementation :)

E-Aho · 2022-10-25T18:32:21Z

skops/card/_card_alternative.py

+        # add arbitrary sections, overwrite them, etc.
+        card.add(hi="howdy")
+        card.add(**{"parent section/child section": "child content"})
+        card.add(**{"foo": "bar", "spam": "eggs"})
+        # change content of "hi" section
+        card.add(**{"hi/german": "guten tag", "hi/french": "salut"})
+        card.add(**{"very/deeply/nested/section": "but why?"})


Love this usage pattern ❤️ feels much easier to add to programatically

E-Aho · 2022-10-25T18:45:48Z

skops/card/_card_alternative.py

+        self.model_diagram = model_diagram
+        self.metadata = metadata or CardData()
+
+        self._data: dict[str, Section] = {}


One thought: Might be worth using an OrderedDict for this (and for the subsections field in the Section class.)

That way it would allow us to have more control over the ordering of the output. I know that as of 3.7, the order of the keys in a dict is just the order of insertion, but it might be nice to offer users the ability to insert between existing sections.

Do we need to support that? I see it as "the card is rendered in the order you're adding things" kinda API, especially since we don't expose this _data to the users as a public API.

Yes, one weakness of the chosen data structure is that it's hard to change the order of sections once they are created. However, I'm not sure how much OrderedDict would help. I haven't used it in quite a while, but from the docs, the only additional methods it seems to have are move_to_end and popitem(last=...), both of which don't help that much? Ideally, we would have something like:

card = Card(...) card.add(section_a="content a", section_c="content c") card.insert(section_b="content b", after="section_a")

or something. Not sure if that's easier to implement with OrderedDict than with normal dicts.

Alternatively, we could implement the underlying tree as a nested list, which gives us more flexibility to modify the order:

>>> card._data [ ("section_a", "content a"), ("section_b", [ "content b", ("section_b0", "content_b0"), ("section_b1", "content_b1"), ], ..., ]

The disadvantage is that selecting items will be more difficult an inefficient.

Do we need to support that? I see it as "the card is rendered in the order you're adding things" kinda API, especially since we don't expose this _data to the users as a public API.

I think it would be fine to not support changing ordering for an MVP, but it would be at least a nice-to-have down the road imo. Especially if the default Card class users handle has a fixed order for the defaults. Users might want to insert things between other sections.

The disadvantage is that selecting items will be more difficult an inefficient.

I may have been remembering the abilities of OrderedDicts with rose tinted glasses...
Another alternative that might be a little simpler would be to hold a subsection_order attribute in each Section that is just a list with the names of children, and renders them in that order.

So if we rendered, as above:

("section_b", [ "content b", ("section_b0", "content_b0"), ("section_b1", "content_b1"), ], )

Section b would just have an attribute like [__content__, section_b0, section_b1]

We could implement ease-of-use things like support for .add(..., after="foo") as above as well

On efficiency, while it's always good to think about, I don't think we need to get too deep into optimizations with something like this. I'd be very surprised if anyone had even 30 sections total, and at that kind of scale it's not really a limitation imo.

The disadvantage is that selecting items will be more difficult an inefficient.

We can always implement __getitem__.

Another alternative that might be a little simpler would be to hold a subsection_order attribute in each Section that is just a list with the names of children, and renders them in that order.

This would be a kind of index. I think if possible, we should avoid it so that we don't leave room for possible bugs where the content and the "index" are out of sync.

On efficiency, while it's always good to think about, I don't think we need to get too deep into optimizations with something like this.

Probably you're right, the main concern should be usability and simple code. But you never know how users can surprise you. In skorch, we ran into an issue because a user was training a neural net for tens of thousands of epochs, which we didn't expect...

We can always implement __getitem__.

Yes, I thought about that, it just feels "wrong" to have that method when the access is O(n) and not O(1).

I feel like nested list is a bit overkill (and selection should be less efficient as you said). I'm in favor of implementation as is with a regular dict.

E-Aho · 2022-10-25T18:53:19Z

skops/card/_card_alternative.py

+        self._add_metrics(self._metrics)
+        return self
+
+    def _add_metrics(self, metrics: dict[str, float | int]) -> None:


One thought I had (feel free to challenge me or tell me it's a bad idea):

If we have a lot of these helper _add_... sections, it might be nice to split this class into two classes, one with the core functionality methods (e.g _strip_blank or __repr__) and a subclass with the default formatting/helper functions, which a user might want to tweak themselves.

That way, it would be a lot easier for a user to make their own MyCard class that has different formatting/ extra methods for their own models, similar to how they might create their own template in the old .md world.

I think instead of that they can pass something Formatable to add, and that's much easier to implement for them, isn't it?

That way, it would be a lot easier for a user to make their own MyCard class that has different formatting/ extra methods for their own models, similar to how they might create their own template in the old .md world.

Let me try to understand this better. Right now, if a user would like to make small modifications to the existing templates, I think this can be easily accommodated with the added dynamism of the new class. If they want to have a new default template that they can share with others, it's not easily possible, they would have to share the code (a script) that generates this new template.

What you suggest is that it should be possible to create a subclass of Card that starts with the new default. I like the idea. If that requires to split the class, I'm not sure, since pre-filling the card is not implemented yet (this is what I hoped to achieve with the make_default class method). If splitting makes it easier for the user, then I'm all for it, I would have to see it in action.

Another issue is if we want to enable completely different output formats. Just as an example, what if a user wants to create an rst or org file instead of md, or another md flavor? Right now, the md format is hard-coded. If we could split the class so that the "formatting engine" is decoupled, it would be a nice gain. But maybe we'll just never support other formats and users have to use pandoc or similar tools to achieve this.

adrinjalali

I quite like the idea. I wonder how much work it'd be to load an .md file back to this object.

adrinjalali · 2022-10-26T09:28:51Z

skops/card/_card_alternative.py

+        self.model_diagram = model_diagram
+        self.metadata = metadata or CardData()
+
+        self._data: dict[str, Section] = {}


Do we need to support that? I see it as "the card is rendered in the order you're adding things" kinda API, especially since we don't expose this _data to the users as a public API.

skops/card/_card_alternative.py

adrinjalali · 2022-10-26T09:30:24Z

skops/card/_card_alternative.py

+        self._metrics: dict[str, float | int] = {}
+        self._reset()
+
+    def _reset(self) -> None:


from the code I don't see a reset happening here. As in, nothing seems to be cleared.

Ah yes, it's not used yet. The idea is that if I override, say, the model, we can internally call _reset and more sure that everything that relates to the change is updated.

is this going to simply erase README.md as it's still what we use to version models and create a new one? just to make sure we're on same page.

No, this will not erase the README.md. The intent was just that if a user re-assigns the model, in the setter, we can call self._reset() and every part of the model card that needs updating will be updated. I will do the proper implementation once #205 is merged, I hope it becomes more obvious then.

I feel like it makes sense to update the parts that weren't previously written by the user (hyperparameters, plot and so on) it was a pain point with modelcards as well.

That's the idea (though I wouldn't touch any written README.md). So when we have prefilled=True, those sections are auto-created and auto-reset.

Something that doesn't work, though, is if the user would choose a different initial template, because then we don't know which sections need to be rewritten or not. I have to see if I find a solution there.

adrinjalali · 2022-10-26T09:33:22Z

skops/card/_card_alternative.py

+        self._add_metrics(self._metrics)
+        return self
+
+    def _add_metrics(self, metrics: dict[str, float | int]) -> None:


I think instead of that they can pass something Formatable to add, and that's much easier to implement for them, isn't it?

Added a test that shows that the new card produces the same output as the old card (except for a few non-deterministic parts). This includes most of the idiosyncrasies of the old card we might want to change in the future (e.g. inconsistent capitalization, use of empty lines). Some of the more problematic behaviors of the old card class were, however, fixed (e.g. creating an empty metrics table when there are no metrics). The other tests have been reworked to use the new card features to make them more precise. Often, that means that instead of having a very weak test like "assert 'foo' in card.render()", it is now possible to select the exact section and check that it equals the expected output. This work is still unfinished, specifically it still lacks tests for the card repr and for the newly added features.

Some refactoring to clean up things, rework repr, make repr tests pass.

Also some better type annotations.

merveenoyan

For very basic details I'm in favor of keeping this PR as is and not changing any data structures as it looks quite simple. It was quite overwhelming to catch up with this and review though, I will never take time off from work or go to far east again 😂😂

merveenoyan · 2022-11-02T17:12:18Z

skops/card/_card_alternative.py

+        self._metrics: dict[str, float | int] = {}
+        self._reset()
+
+    def _reset(self) -> None:


is this going to simply erase README.md as it's still what we use to version models and create a new one? just to make sure we're on same page.

merveenoyan · 2022-11-02T17:26:08Z

skops/card/_card_alternative.py

+        self.model_diagram = model_diagram
+        self.metadata = metadata or CardData()
+
+        self._data: dict[str, Section] = {}


I feel like nested list is a bit overkill (and selection should be less efficient as you said). I'm in favor of implementation as is with a regular dict.

adrinjalali

I didn't dig deep into everything here, but overall it looks great. Genrally:

I think quite a few of the private methods could use a docstring to make it more clear what they do.
The tests are changed quite a bit. I can't tell which ones are improvements to existing tests and which ones are due to breaking changes. I don't mind breaking changes when it comes to the rendered doc as a result of this PR, but it'd be nice to have a rather small-ish change in the tests compared to what we have now.
It's not clear to me what the user can do when they select a section, and why they'd do that.

docs/model_card.rst

examples/plot_model_card.py

adrinjalali · 2022-11-21T13:38:27Z

skops/card/_model_card.py

+if sys.version_info >= (3, 8):
+    from typing import Protocol
+else:
+    from typing_extensions import Protocol


can this go in fixes.py?

Funny that you should ask. That's where I put it initially and imported it here. This does not work. If my understanding is correct, there is some code in mypy that literally looks for this snippet in the module and then makes sure not to make a fuss. Putting it elsewhere and importing does not work -__-

so it works if you put type: ignore on it? I rather do that than having it here. To me, "doesn't work" would mean "python can't run it" rather than "mypy doesn't understand it" 😁

The problem when putting this into fixes.py and importing it is that for some reason, mypy doesn't understand the protocol anymore. E.g. it'll claim:

skops/card/_model_card.py:694: error: Argument 2 to "_add_single" of "Card" has incompatible type "PlotSection"; expected "Union[Formattable, str]"

even though PlotSection conforms to the Formattable protocol. Now we could put a # type: ignore on this method, but then we might as well not use a protocol at all, if it is never checked.

Yes, it is annoying that this doesn't work. However, in this case I'd be in favor of keeping it as is because it's a temporary fix anyway. As soon as we can stop supporting Python 3.7, all these gymnastics can go away (official EOL for Python 3.7 is 27 June 2023).

ok, then we can add a comment here and in fixes.py to remember it.

skops/card/_model_card.py

BenjaminBossan

I think quite a few of the private methods could use a docstring to make it more clear what they do.

Yeah, I focused on public methods, will add docstrings to the private ones too.

The tests are changed quite a bit. I can't tell which ones are improvements to existing tests and which ones are due to breaking changes. I don't mind breaking changes when it comes to the rendered doc as a result of this PR, but it'd be nice to have a rather small-ish change in the tests compared to what we have now.

None of the changes are breaking, except that we don't automatically put some stuff in specific sections anymore, instead giving users the control over the sections. In a previous version of this PR, I even had a test that showed that old and new implementation provide identical results:

https://github.com/BenjaminBossan/skops/blob/e7a2ae3e0e56738423962df53dd4535a11da33ea/skops/card/tests/test_card_alternative.py#L828-L958

However, as we decided to replace the old implementation, the test also had to go.

Regarding how the tests are changed, they test for the most part exactly what has been tested before, but they are more strict. Let's illustrate with an example. Below is the old test for adding a plot:

def test_add_plot(destination_path, model_card):
    plt.plot([4, 5, 6, 7])
    plt.savefig(Path(destination_path) / "fig1.png")
    model_card = model_card.add_plot(fig1="fig1.png").render()
    assert "![fig1](fig1.png)" in model_card

Here is the new test:

def test_add_plot(destination_path, model_card):
    plt.plot([4, 5, 6, 7])
    plt.savefig(Path(destination_path) / "fig1.png")
    model_card = model_card.add_plot(fig1="fig1.png")
    plot_content = model_card.select("fig1").content.format()
    assert plot_content == "![fig1](fig1.png)"

As you can see, they are fairly similar. However, the old test would succeed if anywhere in the model card, there is a string "![fig1](fig1.png)". The new test specifically tests that this string is in the section we indicated. A similar pattern can be observed with the other tests.

I know that the diff view does not really help in identifying these changes, maybe it's better to put old and new tests side-by-side and going from top to bottom.

It's not clear to me what the user can do when they select a section, and why they'd do that.

The main reason I added select and delete was to be complete, so if a user wants to change or delete a section, they get the possibility (allowing all CRUD operations basically).

E.g. select allows you to do stuff like:

card = Card(...)
card.add_plot(**{"Plots/My awesome plot": "myfig.png"})
pl = card.select("Plots/My awesome plot")
pl.title = "My new title"

and it will be reflected in the model card. Without select, you'd need to recreate the whole section. In effect, this means that building the model card as a whole can be a lot more interactive, leaving room for mistakes and fixing them.

Another benefit of select is that we can use it for testing more precisely (see comment above).

I will address some of your minor comments and then provide a new update once the permutation importances and model loading from str are merged, to integrate those features as well.

docs/model_card.rst

examples/plot_model_card.py

BenjaminBossan · 2022-11-21T15:46:01Z

skops/card/_model_card.py

+if sys.version_info >= (3, 8):
+    from typing import Protocol
+else:
+    from typing_extensions import Protocol


Funny that you should ask. That's where I put it initially and imported it here. This does not work. If my understanding is correct, there is some code in mypy that literally looks for this snippet in the module and then makes sure not to make a fuss. Putting it elsewhere and importing does not work -__-

skops/card/_model_card.py

- Remove noise from docstring example - Add the comma after model repr - Add docstrings to private methods

Users can now choose to use no template, skops template, hub template, or their own template. Using their own template disables a lot of prefilling (say, putting the model plot in the card) because we wouldn't know where to put it. Users will need to call card.add for the otherwise prefilled sections.

adrinjalali · 2022-11-28T10:31:29Z

skops/card/_templates.py

+    ),
+}
+
+HUB_TEMPLATE = {


spending more time on this, I don't see an easy way to make it suitable for usual tabular usecases. For instance, the environmental impact is pretty much zero for most models.

However, one can do NLP/image usecases and have massive models as a part of a scikit-learn pipeline. We can start with a minimal model card template and improve it then.

So what is your suggestion here? To trim the HUB_TEMPLATE down to only contain those sections that make most sense for sklearn models? Or create a new template?

my suggestion is to go back to support only one template, the one we already have, and change it/improve it/add more templates in separate PRs.

Okay, so keep the existing code that would allow to use different templates but only keep the "skops" template as a valid template for the time being. Would you still keep support for custom templates (passed as a dict) and no template?

Also, I'd probably keep the HUB_TEMPLATE in _template.py but make it private, since it was a lot of work to implement it. Otherwise, if we squash and merge but later want to go back, we can't :)

That all sounds good to me. Supporting custom templates via dict also makes sense.

This can be useful, because otherwise it takes a bit of effort to retrieve the latest section.

It's ugly, but there is no technical reason from prohibiting the addition of tables without rows. (Note, columns are still required). This allows us to use TableSection for formatting the metrics, instead of calling tabulate there directly. This is better, since we don't have 2 separate ways of creating metrics.

As discussed here: skops-dev#72 (comment) Description This feature adds a new function, skops.card.parse_modelcard. When passing it the path to a dumped model card, it parses it using pandoc and returns a Card object, which can be further modified by the user. In the end, this turned out easier than I initially thought it would. The main difficulty are the data structures returned by the pandoc parser, for which I couldn't find any documentation. I guess Haskell code is just self-documenting. For this reason, there are probably quite a few edge cases that I haven't covered yet. Just as an example, when parsing tables, pandoc tells us how the columns are aligned. This information is currently completely discarded (we let tabulate choose the alignment). If we want to preserve the table alignment, we would need to make some changes Implementation This feature requires the alternative card implementation from skops-dev#203 pandoc is used for the following reasons: - widely used and thus battle tested - can read many other formats, not just markdown, so in theory, we should be able to read, e.g., rst model cards without modifying any code The disadvantage is that pandoc is not a Python package, so users need to install it separately. But it is available on all common platforms. For calling pandoc, I chose to shell out using subprocess. I think this should be fine but LMK if there is a better way. There is a Python package that binds pandoc (https://github.com/boisgera/pandoc) but I don't think it's worth it for us to add it, just to avoid shelling out. The package seems to have low adoption and contains a bunch of stuff we don't need. I chose to implement this such that the parser that generates the Card object should not have to know anything about Markdown. Everything related to Markdown is moved to a separate class in _markup.py. In an ideal world, we would not have to know anything about markdown either. Instead the Card object shoud have methods (similar to what we already have for add_plot etc.) that handles all of that. But in practice, this is far from being true. E.g. if a user wants to add bold text, there is no special method for it, so they would need to add raw Markdown. The Card class is thus a leaky abstraction. TODOs This PR is not finished. Remaining TODOs that come to mind: 1. We need to merge the alternative card implementation 2. Documentation has to be updated in several places 3. Tests need to be more complex, right now only one Card is tested 4. CI needs to install pandoc so that the tests are actually run 5. There are some specifics here that won't work with all Python versions, like the use of TypedDict.

Most notably, dynamic model loading in Cards was broken.

1. Don't use Hub template for now 2. Document how to add new default templates in code 3. Nicer error message when using invalid template

BenjaminBossan · 2022-12-06T15:52:56Z

@skops-dev/maintainers So I think this PR is finally ready. Please review (again).

There will be some conflict with #142 but it shouldn't be too hard to merge (I can do the work for that).

adrinjalali

I didn't check the tests in detail, but they seem pretty good.

To me this seems pretty close to being mergeable.

adrinjalali · 2022-12-08T15:47:11Z

docs/model_card.rst

+Templates
+---------
+TODO


we can remove this for the merge

adrinjalali · 2022-12-09T11:17:05Z

skops/card/_model_card.py

+if sys.version_info >= (3, 8):
+    from typing import Literal, Protocol
+else:  # TODO: remove when Python 3.7 is dropped
+    from typing_extensions import Literal, Protocol
+


I will remove this, assuming that #245 will be merged first.

skops/card/_model_card.py

adrinjalali · 2022-12-09T11:22:59Z

skops/card/_model_card.py

+    if is_skops_format:
+        lines += [
+            "from skops.io import load",
+            f'model = load("{file_name}")',
+        ]
+    else:  # pickle
+        lines += [f'model = joblib.load("{file_name}")']


we should probably either do from ... import load or import xxx; xxx.load(...) in both cases. I prefer the latter pattern.

I think this is a good opportunity to determine what the canonical way of importing the persistence functions should be. I opened a discord discussion.

the consensus from discord seems to be import skops.io as sio

Oh well, democracy has spoken. I changed it here, but would prefer to change the other occurrences (docs, examples) for another PR, since this one is already big as is.

skops/card/_model_card.py

adrinjalali · 2022-12-09T11:52:57Z

skops/card/_model_card.py

+    def _add_get_started_code(self, file_name: str, indent: str = "    ") -> None:
+        """Add getting started code to the corresponding section"""
+        if self.template not in VALID_TEMPLATES:
+            # unknown template, cannot prefill


probably same here, users should have an easy way to add this to their templates

skops/card/_model_card.py

skops/card/tests/test_card.py

- Remove Python 3.7 compatibility code (requires skops-dev#246) - Removed empty section in docs - Don't add empty metrics table by default - Remove option to select with a list of str (e.g. card.select(['Foo', 'Bar'])) is no longer possible) - Add option to chain selections using select method (e.g. card.select('Foo').select('Bar'))

BenjaminBossan · 2022-12-09T16:44:19Z

@adrinjalali I think I replied to all your comments, addressed most open points or created an issue for future us.

Here are the things in particular that I changed:

Remove Python 3.7 compatibility code (requires MNT add python=3.11 support #246)
Removed empty section in docs
Don't add empty metrics table by default just to keep parity
Remove option to select with a list of str (e.g. card.select(['Foo', 'Bar', 'Baz']) no longer works)
Add option to chain selections using select method (e.g. card.select('Foo').select('Bar').select('Baz') works)

Regarding the last point, it's not quite what you suggested, but I found that having the same method name, select, is more intuitive than having to use select the first time and get (or __getitem__) all subsequent times. I could also live with naming all of them get instead of select, but this way or that, I think the name should be the same.

adrinjalali · 2022-12-12T10:26:19Z

skops/card/_model_card.py

+    if is_skops_format:
+        lines += [
+            "from skops.io import load",
+            f'model = load("{file_name}")',
+        ]
+    else:  # pickle
+        lines += [f'model = joblib.load("{file_name}")']


the consensus from discord seems to be import skops.io as sio

adrinjalali · 2022-12-12T10:33:32Z

skops/card/_model_card.py

+    def _generate_content(
+        self, data: dict[str, Section], depth: int = 1
+    ) -> Iterator[str]:
+        """Yield title and (formatted) contents"""


Suggested change

"""Yield title and (formatted) contents"""

"""Yield title and (formatted) contents.

It recursively goes through the data and yields the title

with the appropriate number of `#`s and the associated

content.

"""

I added something similar, changed the wording a bit.

BenjaminBossan

I addresses your comments.

Tests for Python 3.7 still failing until we drop it.

BenjaminBossan · 2022-12-12T11:36:43Z

skops/card/_model_card.py

+    def _generate_content(
+        self, data: dict[str, Section], depth: int = 1
+    ) -> Iterator[str]:
+        """Yield title and (formatted) contents"""


I added something similar, changed the wording a bit.

BenjaminBossan · 2022-12-12T11:38:00Z

skops/card/_model_card.py

+    if is_skops_format:
+        lines += [
+            "from skops.io import load",
+            f'model = load("{file_name}")',
+        ]
+    else:  # pickle
+        lines += [f'model = joblib.load("{file_name}")']


Oh well, democracy has spoken. I changed it here, but would prefer to change the other occurrences (docs, examples) for another PR, since this one is already big as is.

BenjaminBossan · 2022-12-14T15:31:33Z

@adrinjalali CI is green.

adrinjalali

I might be missing it, but I don't see this point resolved.

adrinjalali · 2022-12-14T16:23:43Z

skops/card/_model_card.py

+
+        return section[leaf_node_name]
+
+    def _add_model_section(self, model) -> None:


we talked about exposing these somehow to users with custom templates, how is that done now?

I thought we had agreed that for now, users would have to use card.add(...)? Or do you mean that this PR should create helper functions like get_model_repr and expose them to users?

The thing with these methods is tat they do more than a simple add. So what we could do in this PR, is to make them public and change the signature to:

def add_model_section(self, model, section=None) -> None:

Then we'd only accept section=None if template is the default template.

I don't remember that we discussed it like that, but I can take a look :)

Presumably, we don't need the model argument and use self.model instead? Should users be allowed to add the model reprs multiple times into different sections?

if we can use self.model, then this argument shouldn't be here in the first place.

I don't care for now if users end up adding it multiple times. If they do, their model card looks silly.

So far, if users had a custom template (including no template at all), they would lose the possibility to add some default sections: - metrics - model plot - hyperparamters - getting started code Now we expose methods to the users to add those sections with a simple call (no need to manually format the content and use card.add). For this to work, users have to indicate the desired section, since we would otherwise not know where to put the content. During the work on this, I also cleaned up the Card tests, which became messier and messier over time. Related tests are now all contained in a class, which makes it a little bit easier to see if for a certain method, all test cases have been covered.

BenjaminBossan · 2022-12-15T14:55:37Z

@adrinjalali I added the feature you requested. Copied the
commit message for better visibility:

So far, if users had a custom template (including no template at all),
they would lose the possibility to add some default sections:

metrics
model plot
hyperparamters
getting started code

Now we expose methods to the users to add those sections with a simple
call (no need to manually format the content and use card.add). For this
to work, users have to indicate the desired section, since we would
otherwise not know where to put the content.

During the work on this, I also cleaned up the Card tests, which became
messier and messier over time. Related tests are now all contained in a
class, which makes it a little bit easier to see if for a certain
method, all test cases have been covered.

Edit: @E-Aho maybe also interesting for you to take another look.

adrinjalali

Very happy with this now. I'll let @E-Aho have a look.

E-Aho

Overall, I love this new implementation, major kudos to @BenjaminBossan for this design.

It looks really extensible and user friendly, great documentation, and really solid usage pattern. 'll try to play with it properly over the weekend to get a better feel for it, but just looking at the code, it already looks awesome.

I had one minor comment that could potnentially help in an edge case for bugs, but in any case it LGTM 🚀.

E-Aho · 2022-12-16T12:13:16Z

skops/card/_model_card.py

+        The individual (sub)sections.
+
+    """
+    placeholder = "$%!?"  # arbitrary sting that never appears naturally


I know this is minor, but I don't love that this will introduce bugs if a user for some reason actually adds this into their string.

If there's not an easier way to do this, we could maybe use a rarer character, like one of the Unicode Control codes. Maybe the separator character \037 would fit better?

It's a valid concern. I had a regex version that should achieve the same:

def split_subsection_names(key: str) -> list[str]: # Use a negative lookbehind assert to make sure that the "/" character is # not preceded by a backslash parts = re.split(r"/(?<!\\/)", key) res = [part.strip().replace("\\", "") for part in parts] return res

Discussed with @adrinjalali and we found it harder to understand and performance-wise, it was the same.

I haven't thought through the implication of using control codes instead. Maybe we can change it later? This shouldn't really affect users.

Happy to change later, I'll merge this in and maybe raise a small PR to do the above over the weekend

E-Aho · 2022-12-16T12:23:12Z

In any case, happy for @adrinjalali to merge this in, the comment I made can easily be tweaked after merging.

adrinjalali · 2022-12-16T12:44:46Z

Woohoo, this is massive! 🥳

E-Aho reviewed Oct 25, 2022

View reviewed changes

adrinjalali reviewed Oct 26, 2022

View reviewed changes

BenjaminBossan added 9 commits October 31, 2022 17:57

Make tests pass

166e33d

Some refactoring to clean up things, rework repr, make repr tests pass.

Add tests for new functionalities and docstrings

a90e3a5

Adjust tests to work with older sklearn versions

fcbd0df

Adjust examples to new card, more docs

4f10f80

Also some better type annotations.

Continue fixing tests

0f5df2e

Try fixing Windows error by specifying encoding

6212d67

Adjust doctest: confusion matrix not stored in cwd

80d218d

Increase test coverage

c6c8042

merveenoyan reviewed Nov 2, 2022

View reviewed changes

BenjaminBossan added 3 commits November 3, 2022 12:08

Try fixing test failure on Windows

e7a2ae3

Replace old by new Card implementation

2a6aab0

Merge branch 'main' into alternative-model-card-implementation

ce16019

adrinjalali reviewed Nov 21, 2022

View reviewed changes

BenjaminBossan commented Nov 21, 2022

View reviewed changes

BenjaminBossan added 3 commits November 21, 2022 17:57

Address reviewer comments

00b7665

- Remove noise from docstring example - Add the comma after model repr - Add docstrings to private methods

Add TODO notes for when Python 3.7 is dropped

8558f26

adrinjalali reviewed Nov 28, 2022

View reviewed changes

adrinjalali added this to the v0.4 milestone Nov 30, 2022

BenjaminBossan added 2 commits November 30, 2022 12:27

Make _add_single return the Section

772a659

This can be useful, because otherwise it takes a bit of effort to retrieve the latest section.

BenjaminBossan mentioned this pull request Nov 30, 2022

Parser that allows to load dumped model cards BenjaminBossan/skops#2

Closed

BenjaminBossan added 2 commits December 1, 2022 14:49

Error when calling add_metric w/ invalid template

26892d6

Merge branch 'main' into alternative-model-card-implementation

f0c496d

BenjaminBossan added 3 commits December 6, 2022 12:49

Some amendments required after merging

b4d821b

Most notably, dynamic model loading in Cards was broken.

A couple more changes:

0b66e14

1. Don't use Hub template for now 2. Document how to add new default templates in code 3. Nicer error message when using invalid template

Add entry to changes.rst

d859c00

BenjaminBossan marked this pull request as ready for review December 6, 2022 15:51

Fix small bug in generated code for loading

4af4b12

adrinjalali reviewed Dec 9, 2022

View reviewed changes

BenjaminBossan mentioned this pull request Dec 9, 2022

Find a way to deal with possible discrepancy between a card's model and model file in metadata #247

Open

BenjaminBossan mentioned this pull request Dec 9, 2022

Add example of using custom templates for model cards #248

Closed

adrinjalali reviewed Dec 12, 2022

View reviewed changes

BenjaminBossan and others added 3 commits December 12, 2022 12:29

Adjust getting started code in model card

00ec8a2

Merge branch 'main' into alternative-model-card-implementation

9ee11fe

Adjust docstrings

91a5b7a

BenjaminBossan commented Dec 12, 2022

View reviewed changes

BenjaminBossan and others added 2 commits December 14, 2022 14:12

Merge branch 'main' into alternative-model-card-implementation

4564dc3

Merge branch 'main' into alternative-model-card-implementation

fa00e03

adrinjalali reviewed Dec 14, 2022

View reviewed changes

adrinjalali approved these changes Dec 15, 2022

View reviewed changes

E-Aho approved these changes Dec 16, 2022

View reviewed changes

E-Aho merged commit e91e238 into skops-dev:main Dec 16, 2022

BenjaminBossan mentioned this pull request Dec 16, 2022

FEAT Parse model cards with pandoc #257

Merged

BenjaminBossan mentioned this pull request Jan 18, 2023

Programmatically add sections to model card #72

Closed

-        """Yield title and (formatted) contents"""
+        """Yield title and (formatted) contents.
+        It recursively goes through the data and yields the title
+        with the appropriate number of `#`s and the associated
+        content.
+        """


		return section[leaf_node_name]

		def _add_model_section(self, model) -> None:

Alternative Card implementation #203

Alternative Card implementation #203

Conversation

BenjaminBossan commented Oct 25, 2022 • edited Loading

Description

Implementation

Demo

BenjaminBossan commented Oct 25, 2022

E-Aho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

E-Aho Oct 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan Oct 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Dec 6, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Dec 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Dec 14, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Oct 25, 2022 •

edited

Loading

E-Aho Oct 25, 2022 •

edited

Loading

BenjaminBossan Oct 26, 2022 •

edited

Loading

BenjaminBossan Dec 14, 2022 •

edited

Loading

BenjaminBossan commented Dec 15, 2022 •

edited

Loading

E-Aho Dec 16, 2022 •

edited

Loading