-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: A little hardening of the auditing of Nodes #340
Conversation
Two measures to harden the auditing (a little bit): - Type annotate the Node's children to prevent setting invalid types. - Change all the tests that use loads to only load trusted types instead of using trusted=True The latter is importent because when setting trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. loading method types was not possible for users who passed trusted!=True. Additional changes As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here: skops-dev#338 (comment) i.e. not storing the method name in children. Another "victim" of this change is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json. Another small change is to exclude the code in skops/io/old from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change.
Ready for review @skops-dev/maintainers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining type annotations like VALID_NODE_CHILD_TYPES
makes the code much less readable to me, but I guess we can let it be.
elif check_type( | ||
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we removing them cause now primitives are trusted by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was referring to here:
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_set
has been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)
would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.
@@ -168,7 +168,7 @@ def pretty_print_tree( | |||
|
|||
|
|||
def walk_tree( | |||
node: Node | dict[str, Node] | list[Node], | |||
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't dict(str, Node)
included in VALID_NODE_CHILD_TYPES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is dict[str, VALID_NODE_CHILD_TYPES]
, so it could be something where the key is not a Node
, like {"foo": None}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.
Yes, it is a tradeoff with readability. I originally had the type definition on the line that it's used, which really adds a lot of noise when reading the Node
code, I think this is an okay compromise, as it the eye can quickly skip it. We could use a shorter alias if that helps.
elif check_type( | ||
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was referring to here:
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_set
has been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)
would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.
@@ -168,7 +168,7 @@ def pretty_print_tree( | |||
|
|||
|
|||
def walk_tree( | |||
node: Node | dict[str, Node] | list[Node], | |||
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is dict[str, VALID_NODE_CHILD_TYPES]
, so it could be something where the key is not a Node
, like {"foo": None}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Description
Two measures to harden the auditing (a little bit):
Node
's children to prevent setting invalid types.loads
to only load trusted types instead of usingtrusted=True
The latter is important because when setting
trusted=True
, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. so far, loading method types was not possible for users who passedtrusted!=True
.Additional changes
As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here:
#338 (comment)
i.e. not storing the method name in children.
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_set
has been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)
would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.Another small change is to exclude the code in
skops/io/old
from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change (as they did here).