Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT: A little hardening of the auditing of Nodes #340

Merged
merged 2 commits into from
Apr 6, 2023

Conversation

BenjaminBossan
Copy link
Collaborator

Description

Two measures to harden the auditing (a little bit):

  • Type annotate the Node's children to prevent setting invalid types.
  • Change all the tests that use loads to only load trusted types instead of using trusted=True

The latter is important because when setting trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. so far, loading method types was not possible for users who passed trusted!=True.

Additional changes

As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here:

#338 (comment)

i.e. not storing the method name in children.

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

Another small change is to exclude the code in skops/io/old from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change (as they did here).

Two measures to harden the auditing (a little bit):

- Type annotate the Node's children to prevent setting invalid types.
- Change all the tests that use loads to only load trusted types instead
  of using trusted=True

The latter is importent because when setting trusted=True, the whole
machinery of checking types is not executed, so any bugs that may be
contained there will not be revealed. In particular, this shows that for
persisting methods, we had a child with a str type and that would raise
an error, i.e. loading method types was not possible for users who
passed trusted!=True.

Additional changes

As a consequence of the last point, the auditing code has been changed
to accept str as type. Alternatively, we can make the change explained
here:

skops-dev#338 (comment)

i.e. not storing the method name in children.

Another "victim" of this change is that the so far dead code of checking
for primitive types inside of get_unsafe_set has been removed. This code
was supposed to check if the type is a primitive type but it was
defective. get_module(child) would raise an error if an instance of the
type would be passed. We could theoretically fix that code, but it would
still be dead code because primitive types are stored as json.

Another small change is to exclude the code in skops/io/old from mypy
checks. Otherwise, we would have to update its type signatures if
signatures in the persistence code change.
@BenjaminBossan BenjaminBossan added the persistence Secure persistence feature label Apr 5, 2023
@BenjaminBossan BenjaminBossan changed the title A little hardening of the auditing of Nodes MNT: A little hardening of the auditing of Nodes Apr 5, 2023
@BenjaminBossan
Copy link
Collaborator Author

Ready for review @skops-dev/maintainers

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.

Comment on lines -276 to -278
elif check_type(
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we removing them cause now primitives are trusted by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to here:

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

@@ -168,7 +168,7 @@ def pretty_print_tree(


def walk_tree(
node: Node | dict[str, Node] | list[Node],
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't dict(str, Node) included in VALID_NODE_CHILD_TYPES?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}

Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.

Yes, it is a tradeoff with readability. I originally had the type definition on the line that it's used, which really adds a lot of noise when reading the Node code, I think this is an okay compromise, as it the eye can quickly skip it. We could use a shorter alias if that helps.

Comment on lines -276 to -278
elif check_type(
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES
):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to here:

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

@@ -168,7 +168,7 @@ def pretty_print_tree(


def walk_tree(
node: Node | dict[str, Node] | list[Node],
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@adrinjalali adrinjalali merged commit 996577c into skops-dev:main Apr 6, 2023
@BenjaminBossan BenjaminBossan deleted the hardening-audit branch April 6, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persistence Secure persistence feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants