-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH
trusting np.ufuncs and np.dtypes
#336
ENH
trusting np.ufuncs and np.dtypes
#336
Conversation
ENH
trusting np.ufuncs and np.dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @omar-araboghli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take true_divide as an example that is an alias for divide and unfortunately cannot be found with the dir(...) and isinstance(...) approach. Is there a way to find numpy aliases
Not that I know of, but it's better to trust too little than too much, so it's fine for now.
test_can_persist_fitted didn't run successfully until I added NUMPY_DTYPE_TYPE_NAMES to the default trusted list of NdArrayNode
This might be a bug, let me investigate before we merge this.
Okay, I believe this is not a bug. The reason is that we use Another source of confusion (at least for me) is what is meant by At the end of the day, I think everything is working here as expected, but the naming can cause confusion. What's your opinion @adrinjalali? |
Changing the names here wouldn't be backward compatible, and will probably take some time, and I also don't have very good ideas for their names. So I think we can leave that for a future iteration. |
True, we should probably add a comment though. @omar-araboghli I think the docs also need to be updated here: https://skops.readthedocs.io/en/stable/persistence.html#usage |
@adrinjalali and @BenjaminBossan thanks for the review and clarification! Please let me know in case something still has to be changed. I defined the function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise it's looking great I think.
skops/io/_utils.py
Outdated
@@ -200,3 +201,36 @@ def get_type_paths(types: Any) -> list[str]: | |||
types = [types] | |||
|
|||
return [get_type_name(t) if not isinstance(t, str) else t for t in types] | |||
|
|||
|
|||
def get_public_type_names(module: ModuleType, _type: Type) -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename _type
to oftype
?
skops/io/_utils.py
Outdated
{ | ||
type_name | ||
for attr in dir(module) | ||
if (isinstance(obj := getattr(module, attr), _type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance
or issubclass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing it to
if (isinstance(obj := getattr(module, attr), _type)) | |
if (issubclass((obj := getattr(module, attr)).__class__, _type)) |
gets the same results for both numpy and scipy. Since the ufuncs are classes and not objects (instances), issubclass
makes more sense to use here. Thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good, thx. Only two minor issues for me, otherwise the PR can be merged.
skops/io/_numpy.py
Outdated
@@ -52,6 +53,8 @@ def ndarray_get_state(obj: Any, save_context: SaveContext) -> dict[str, Any]: | |||
|
|||
|
|||
class NdArrayNode(Node): | |||
# TODO: NdArrayNode and DtypeNode names lead to confusion, see PR-336 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of sending users on a chase, let's just clear the confusion right away: That this is not only responsible for np arrays, but also for np generics.
skops/io/_utils.py
Outdated
if (issubclass((obj := getattr(module, attr)).__class__, oftype)) | ||
and ((type_name := get_type_name(obj)).startswith(module_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both lines, are the outer parens necessary? It seems to me that they could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't easy to use walrus operator in such "complex" statement 😄 Thanks for the catch - removed them!
@omar-araboghli is this ready for review? |
@BenjaminBossan Indeed! Sorry for not pingging you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the suggestions, this looks good now. You'd need to merge the current main and fix the potential merge conflict in changes.rst
, then this is good to go.
It would have been nice if NUMPY_DTYPE_TYPE_NAMES
could also use get_public_type_names
but it wouldn't work as is. If you happen to come up with a clever solution to that works for both, it would be good to make that change, but it's not a blocker.
I just noticed that |
@BenjaminBossan just resolved the conflict and cleaned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work cleaning up the remaining instances, this looks very good now. There is a small typo in one of the docstrings, besides that this can be merged.
I unfortunately couldn't come up with a generic solution
Then it's fine, it would have been nice but it's not a requirement.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@BenjaminBossan Thanks for the review and great catch! The typo is now resolved, thanks to @adrinjalali :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thx
Reference Issues/PRs
closes #224
What does this implement/fix? Explain your changes.
Trusting
numpy.ufuncs
andnumpy.dtypes
by default.Any other comments?
true_divide
as an example that is an alias fordivide
and unfortunately cannot be found with thedir(...)
andisinstance(...)
approach. Is there a way to find numpy aliases ?test_can_persist_fitted
didn't run successfully until I addedNUMPY_DTYPE_TYPE_NAMES
to the default trusted list ofNdArrayNode
. I am wondering why isn'tDTypeNode
the correct place ? Is there any other Nodes I am missing ?