Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure persistence: add extensibility #141

Open
BenjaminBossan opened this issue Sep 16, 2022 · 6 comments
Open

Secure persistence: add extensibility #141

BenjaminBossan opened this issue Sep 16, 2022 · 6 comments
Labels
persistence Secure persistence feature

Comments

@BenjaminBossan
Copy link
Collaborator

A mechanism should be implemented that allows library authors to add secure persistence to their library, and for users to opt those libraries in.

@BenjaminBossan BenjaminBossan added the persistence Secure persistence feature label Sep 16, 2022
@adrinjalali
Copy link
Member

Thinking more about this, what do you think of supporting only __getstate__/__setstate__, and allowing certain types as we do now, like lists, dicts, function, etc.

This would mean we don't need to do much on our side, and they only need to implement a sensible __getstate__/__setstate__ pair.

If we manage to move all sklearn models to the same path, then we can completely remove __reduce__.

@BenjaminBossan
Copy link
Collaborator Author

BenjaminBossan commented Sep 27, 2022

That sounds good, we can start with __getstate__ & __setstate__ and hopefully that's good enough. If it isn't, we can later think about providing more. If we can eventually stop supporting __reduce__, it should really lower our burden as well.

Still, the user should have to opt in explicitly. Do you already have an API in mind?

@adrinjalali
Copy link
Member

I'm not sure what you mean by user in this context. I'm thinking the save would automatically work if the object of third party libraries provide these methods, and during load, we'd fail if we don't know the library, and user can say they trust the source kinda thing.

I'm still not sure if we should load simple objects if the user has the corresponding library installed or not, if they have it installed, they trust it?

@BenjaminBossan
Copy link
Collaborator Author

and during load, we'd fail if we don't know the library, and user can say they trust the source kinda thing.

That's what I meant. How would the API for this look like?

I'm still not sure if we should load simple objects if the user has the corresponding library installed or not, if they have it installed, they trust it?

Previously, I would have said that if you installed a malicious library, you're probably already compromised, so there is nothing to be gained from not trusting it at this point. But IIUC, Python is moving in a direction of not allowing arbitrary code to be run during installation (avoiding setup.py), so just installing a malicious package wouldn't mean you're compromised. But for the time being, there is of course still a lot of code that relies on setup.py that needs to be supported and I'm not sure if that's planned to be deprecated. (This is all from memory, maybe I get some details wrong.)

@adrinjalali
Copy link
Member

Packages can run arbitrary code through .pth files. We don't need setup.py to run arbitrary code :D

discussions about removing .pth files: https://bugs.python.org/issue33944
rejected PEP about an alternative mechanism: https://peps.python.org/pep-0648/

While saving models, we save everything, it's only while loading that security issues come in play. Therefore I think simply relying on __getstate__/__setstate__ is enough, and there doesn't need to be an extra user facing API for it.

@BenjaminBossan
Copy link
Collaborator Author

Packages can run arbitrary code through .pth files. We don't need setup.py to run arbitrary code :D

Oh man, it never stops, does it? Okay, so we can basically assume that if a malicious package is installed, the user is already compromised.

While saving models, we save everything, it's only while loading that security issues come in play. Therefore I think simply relying on __getstate__/__setstate__ is enough, and there doesn't need to be an extra user facing API for it.

Okay, let's do it this way and hopefully it'll be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persistence Secure persistence feature
Projects
None yet
Development

No branches or pull requests

2 participants