MNT Persist clean up and consistency checks #143

BenjaminBossan · 2022-09-17T11:10:45Z

During the work on adding strict typing (which will most likely not
work), a couple of opportunities for simplifications and clean ups came
up. In order for them not to get lost in the typing PR, they are now
added in this separate PR. Changes are:

A few simplifications to the state to avoid unnecessary nesting
Consistently raise UnsupportedTypeException
Simplify some code without without functional change
Consistently use _get_state and _get_instance
Enforce consistency on the argument names in get_state/instance
functions

Regarding the use of _get_state vs get_state and _get_instance vs get_instance:
Since we basically only ever want to use the _get variant and not get (since the
latter are only placeholders for registration), I would suggest that we swap the
names. This will also be better for future contributions to 3rd party libraries,
since right now they would be required to use "private" functions.

Ready for review @skops-dev/maintainers

(PS: Sorry for spurious commits, I'll clean them up)

After checking a couple of runs, none that took longer than 12 min ever finished. Therefore, let's cap the timout at 15 min for now. Later, if/when we factor out the inference tests, we may decrease the timeout even further.

During the work on adding strict typing (which will most likely not work), a couple of opportunities for simplifications and clean ups came up. In order for them not to get lost in the typing PR, they are now added in this separate PR. Changes are: - A few simplifications to the "state" to avoid unnecessary nesting - Consistently raise UnsupportedTypeException - Simplify some code without without functional change - Consistently use _get_state and _get_instance - Enforce consistency on the argument names in get_state/instance functions

adrinjalali

Overall pretty nice!

Regarding the use of _get_state vs get_state and _get_instance vs get_instance:
Since we basically only ever want to use the _get variant and not get (since the
latter are only placeholders for registration), I would suggest that we swap the
names. This will also be better for future contributions to 3rd party libraries,
since right now they would be required to use "private" functions.

+1

(PS: Sorry for spurious commits, I'll clean them up)

We squash/merge, don't worry about the commits ;)

adrinjalali · 2022-09-26T11:10:10Z

skops/io/_numpy.py

-    try:
+    # First, try to save object with np.save and allow_pickle=False, which
+    # should generally work as long as the dtype is not object.
+    with suppress(ValueError):


the suppress/return pattern makes this harder to understand. Any reason you don't want to stay with try/except? try/except would seem much more readable to me.

I prefer early return to lengthy try...except blocks but I will change it back if you prefer the other way round.

I agree with early returns usually, but this suppress/return is somewhat convoluted!

adrinjalali · 2022-09-26T11:12:17Z

skops/io/_numpy.py

+    # convert them to a list and recursively call get_state on it. For this, we
+    # expect the dtype to be object.
+    if obj.dtype != object:
+        raise UnsupportedTypeException(


would be nice to test these. But I'm happy to leave tests for individual methods to a separate PR.

I tried to write a test for it but it wasn't that simple. This is the easiest I could come up with:

class _MockDtype: metadata = None names = None subdtype = None str = "<f8" hasobject = False _mock_dtype = _MockDtype() class _CustomDtypeArray(np.ndarray): @property def dtype(self): return _mock_dtype @property def itemsize(self): # np.save checks this attribute, so (ab)use this to trigger a ValueError raise ValueError("Trigger ValueError") class NumpyUnknownDtypeEstimator(BaseEstimator): def fit(self, X, y=None, **fit_params): self.x_ = _CustomDtypeArray([1, 2, 3]) return self def test_unknown_numpy_dtype_raises(tmp_path): est = NumpyUnknownDtypeEstimator().fit(None) f_name = tmp_path / "file.skops" with pytest.raises(UnsupportedTypeException): save_load_round(est, f_name)

I found it interesting that without the itemsize property, saving would actually just work! Loading would fail though.

But I think what this shows us is that it's almost impossible to trigger this code path as a user, even with custom dtypes (if that's even a thing), unless the user goes throw extraordinary lengths to provoke the error. So I don't think this situation will really come up in the wild.

This makes me think that we might have to check the dtype differently inside this function. Not sure if we want to support custom dtypes and hope for the best or if we should check against all existing dtypes and raise if we don't know this one.

sklearn has custom dtypes, in trees, but if that's not triggering this, then not many things probably would, and that means we can remove it?

Thanks for the pointer. I checked and np.save and np.load seem to work fine on that custom dtype (I suppose you mean this dtype=[('left_child', '<i8'), ...] thing).

It means we could remove the if obj.dtype != object: check, given how we never trigger it. I just wonder if it isn't more cautious to leave it there to have an explicit error? We don't really know what would happen if we have a custom dtype that cannot be saved with np.save, maybe it's better to raise and get a user report? We could encourage reporting by extending the error message with "Please open an issue on...".

Yeah, extending the error message makes sense. I'm okay with this line not being tested.

skops/io/tests/test_persist.py

- Use try...except instead of suppress - Add comments to tests

adrinjalali

I think renaming _get_state, _get_instance is also left.

adrinjalali · 2022-09-29T10:35:12Z

skops/io/tests/test_persist.py

@@ -61,6 +64,66 @@
 ATOL = 1e-6 if sys.platform == "darwin" else 1e-7


+@pytest.fixture(autouse=True)
+def debug_dispatch_functions():


I feel like you're hacking type checks here lol. Nice!

adrinjalali · 2022-09-29T10:35:52Z

skops/io/_numpy.py

+    # convert them to a list and recursively call get_state on it. For this, we
+    # expect the dtype to be object.
+    if obj.dtype != object:
+        raise UnsupportedTypeException(


Yeah, extending the error message makes sense. I'm okay with this line not being tested.

... if an unknown dtype causes an error.

BenjaminBossan · 2022-09-29T13:22:25Z

I think renaming _get_state, _get_instance is also left.

Let me do this in another PR.

Yeah, extending the error message makes sense.

Done.

I feel like you're hacking type checks here lol. Nice!

Maybe a little bit ;) but my main concern was the inconsistent naming, e.g. obj when it's a state.

Ready for re-review @adrinjalali

The functions that should be actually used everywhere are _get_state and _get_instance, not get_state and get_instance. This is inconvenient and confusing. Therefore, swap out the names so that the functions being used everywhere now have the "public" name and the other ones the "private" name. This was discussed here: skops-dev#143 (comment)

BenjaminBossan and others added 7 commits September 6, 2022 17:52

Reduce CI timeout from 60 to 15 min

6ed1cba

After checking a couple of runs, none that took longer than 12 min ever finished. Therefore, let's cap the timout at 15 min for now. Later, if/when we factor out the inference tests, we may decrease the timeout even further.

Merge branch 'skops-dev:main' into main

10912ce

Merge branch 'skops-dev:main' into main

d03a520

Merge branch 'skops-dev:main' into main

04e6ce1

Merge branch 'skops-dev:main' into main

56764a9

Add a few more argument checks

cabf752

adrinjalali reviewed Sep 26, 2022

View reviewed changes

Reviewer comments

399a81c

- Use try...except instead of suppress - Add comments to tests

adrinjalali reviewed Sep 29, 2022

View reviewed changes

BenjaminBossan added 2 commits September 29, 2022 15:09

Reviewer comment: ask users to report issue

a41c163

... if an unknown dtype causes an error.

Merge branch 'main' into persist-clean-up-and-consistency-checks

3f6ca2f

adrinjalali changed the title ~~Persist clean up and consistency checks~~ MNT Persist clean up and consistency checks Sep 29, 2022

adrinjalali merged commit 95a718a into skops-dev:main Sep 29, 2022

BenjaminBossan deleted the persist-clean-up-and-consistency-checks branch September 29, 2022 14:28

BenjaminBossan mentioned this pull request Sep 30, 2022

MNT Swap the names of _get_state <=> get_state & _get_instance <=> get_instance #161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Persist clean up and consistency checks #143

MNT Persist clean up and consistency checks #143

BenjaminBossan commented Sep 17, 2022

adrinjalali left a comment

adrinjalali Sep 26, 2022

BenjaminBossan Sep 26, 2022

adrinjalali Sep 26, 2022

adrinjalali Sep 26, 2022

BenjaminBossan Sep 26, 2022

adrinjalali Sep 26, 2022

BenjaminBossan Sep 27, 2022

adrinjalali Sep 29, 2022

adrinjalali left a comment

adrinjalali Sep 29, 2022

adrinjalali Sep 29, 2022

BenjaminBossan commented Sep 29, 2022 •

edited

Loading

MNT Persist clean up and consistency checks #143

MNT Persist clean up and consistency checks #143

Conversation

BenjaminBossan commented Sep 17, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Sep 29, 2022 • edited Loading

BenjaminBossan commented Sep 29, 2022 •

edited

Loading