Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insecure data serialization method by default with pickle on Cache #191

Open
matheusbrat opened this issue May 14, 2020 · 14 comments
Open

Comments

@matheusbrat
Copy link

matheusbrat commented May 14, 2020

Hello,

Beaker uses Pickle on Session and Cache. On Session at least it has support to secret/HMAC but on caching it doesn't which can lead to arbitrary code execution.

#35

Maybe it would be wise to expose data_serializer in Cache too and maybe even allow a secret definition for caching and perhaps default to json serialization.

What do you think?

@amol-
Copy link
Collaborator

amol- commented May 20, 2020

Sessions should provide a data_serializer argument that allows you to switch between JSON and PICKLE.

For session the primary driving force was that under some conditions it was easy to edit the session content (even just by editing a cookie in your browser if you didn't sign the cookie) and thus it could be an easy attack vector.

For caches it has been a generally widespread need to cache Python objects (for example people frequently cache the entire result of a SQLAlchemy query) so pickle has been the facilitator that allowed that.

On the cache the risk seems much more more contained, as you usually control what you cache and an attacker should have already breached your internal network, your database or your codebase to be able to inject malicious pickle data.

@matheusbrat
Copy link
Author

matheusbrat commented May 20, 2020

It is much more contained but the risk exists and I believe it could be reduced even more if we had the secret option as we do for Session. As you said a bad actor already gained access to something it shouldn't. In my opinion, not improving because the bad actor already gained some sort of access it the same as leaving root without protection - If the bad actor already got access to user...

This also reminds me: Shouldn't we replace the sha1 to sha256?

@amol-
Copy link
Collaborator

amol- commented May 20, 2020

On the pickle in caches the only issue is that there is no easy way to provide the same level of features that people expect from the cache. By replacing pickle with something else it would still need to be able to decode to arbitrary Python objects and thus it would probably quickly grow the same security concerns that Pickle has.

In the context of Pickle I would probably go in the direction of signing every cached value as an optional feature, so that you know they were not tampered with when loading them back. Seems a good reduction to the risk of most cases without having to sacrifice any existing feature of the cache.

PS: Which SHA1 usages you refer to? SHA1 in beaker it's mostly just used a way to generate random ids (sometimes stable). I would probably replace it with uuid in most cases. But people being able to reverse those should only result in reading a bunch of pretty random strings.

@matheusbrat
Copy link
Author

matheusbrat commented May 20, 2020

Sorry that I wasn't clear but I was referring to the "secret" we have on SignedCookie approach which is exactly you said of signing the messages.

I agree that would reduce the attack surface a lot.

How do you think it would be good to implement this?
Maybe something like:

  • Extending NameSpaceManager to have a loads/dumps method which simply return the value
  • Creating a new PickledNameSpaceManager which implements the loads/dumps with pickle.loads/dumps and if secret is present also does the validation.
  • Update Redis/Database/Etc to extend from PickledNameSpaceManager and instead of relying on pickle loads/dumps directly they would call self.loads for the value they fetched on their own way.

For example for Redis:

    def __getitem__(self, key):
        entry = self.client.get(self._format_key(key))
        if entry is None:
            raise KeyError(key)
        return self.loads(entry)

I'm not super familiar with beaker code so feel free to suggest a better approach.

@tcullum-rh
Copy link

@x-warrior @amol- , I've not very familiar with beaker, so wanted to clarify: What is the risk & source of untrusted data here? From what I see (and I could be wrong), the container.py file has code which deserializes data from the dbm via pickle.loads(self.dbm[key]) and this is the issue. Am I right? I'm trying to understand how untrusted data could make its way into the cache in beaker use-cases.

@matheusbrat
Copy link
Author

matheusbrat commented Jun 25, 2020

@tcullum-rh, If you assume your system is secure and only beaker has access to the storage... You can think it is "secure". But if a bad actor is able to get access to your database (where you store your cache), he can now easily run any arbitrary code in the machine running beaker as well.

A lot of people would say: "Ah but than your system is already compromised..." but I believe this is not an excuse, you are making the life of the bad actors to gain access to everything even easier.

If beaker signed the message before saving it on the storage, in the case a bad actor gets access to the storage the attack surface would be smaller.

Does that make sense?

@tcullum-rh
Copy link

@x-warrior yes, thank you. Note that this is now CVE-2013-7489: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-7489 .

@matheusbrat
Copy link
Author

Thanks @tcullum-rh. I will wait for replies from beaker team and than maybe I can work to improve this.

@amol-
Copy link
Collaborator

amol- commented Jun 30, 2020

If you are willing to submit a PR to sign the cached messages, I'm more than willing to review & merge it.
Just make sure it's possible to enable/disable signature and it's backward compatible as I can foresee most people will get angry if their cache suddenly can't load or if loading cached values suddenly takes longer than computing them :D

@EtienneBruines
Copy link

Pyupio gave this id pyup.io-38464

https://github.com/pyupio/safety-db/blob/master/data/insecure_full.json#L975

@neoclust
Copy link

do you have any news on this CVE ?

@tang475
Copy link

tang475 commented Aug 27, 2021

Do you have any news on this CVE ? We are using this software and we are looking for a way to fix this issue. Thank you

@vivekhub
Copy link

vivekhub commented Jul 9, 2022

If beaker signed the message before saving it on the storage, in the case a bad actor gets access to the storage the attack surface would be smaller.
@matheusbrat how would this solution work? If a bad actor gets access to the storage, they can change the contents and the signature (assuming the signature is stored with data). Please help me understand your solution here.

@matheusbrat
Copy link
Author

If a bad actor gets access to the storage, they can change the contents and the signature (assuming the signature is stored with data). Please help me understand your solution here.

If a bad actor get access to the storage only, he would not have the "signing key" thus he would not be able to generate a valid signature for the new content. Meaning whenever the "backend" load that content, it would see the key doesn't match it's own key and discard the data or throw an exception.

Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants