-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: Temporary storage for a secret store password. #43
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code look quite clean, a few comments from my side:
- Doing most of the crc calculation ontop of
strings
isn't the most efficient, but if it's fast enough, so be it. - Looks like regular SHM, I don't see where there would be any
password
specifics ("extra security")- What is the problem scenario you are trying to improve/solve?
The problem I am trying to solve is about the notebook usage. Most notebooks need access to the secret store where we keep all sorts of configurations. Some of them are sensitive, like for example the database connection details. Therefore the store is protected by a password. Currently, the user has to enter the password in every notebook, which is a bit tedious. The idea is to let the user enter the password and put it into an in-memory store. Ideally, we would also limit the lifespan of this storage. |
Co-authored-by: Nicola Coretti <nico.coretti@gmail.com>
return ''.join('0' if a == b else '1' for a, b in zip(sequence, crc_divisor)) | ||
|
||
|
||
def compute_crc(sequence: str, crc_divisor: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would compute this by myself. Python itself doesn't provide crc, but a hash algorithm should also be fine here. Performance isn't that critical. Or, alternative we need to import https://pypi.org/project/crc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative, there is a crc32 in the standard lib https://docs.python.org/3/library/binascii.html?highlight=crc32#binascii.crc32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I will leave it as it is for now. The chances are it will be gone. For example, if we start using the file lock we definitely can scrap the CRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
def encode(content: str, crc_divisor: str, | ||
creation_time: datetime) -> Tuple[int, bytearray]: | ||
""" | ||
Creates a bytearray with encoded content and its creation datetime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for adding creation time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to limit the lifetime of a password in the shared memory. If we have a process that periodically performs a health check or something, it can clear the shared memory content if it was created let's say more than 4 hours ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Cyclic Redundancy Check (CRC) code. The CRC is computed over both the timestamp | ||
and the content. | ||
|
||
The need for a CRC is debatable. Its use is motivated by the problem of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concurrent access is not hypothetical. It is possible with our setup. I would recommend to protect the share memory with inter process lock. In the ITDE we used a file lock for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know, a file lock is usually used as a synchronization mechanism for shared memory. I don't think it's worth doing in this case unless this code finds another usage. Why would we use it in the setup? Currently, the concept is quite simple: 1. The shared memory may or may not contain the data we are after. 2. The data may be corrupted. The user has to handle both cases, which works well for passwords. We try to minimise the chances of (2) happening by using the CRC. But since we cannot completely eliminate this possibility I wonder if it's worth doing it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the worst case, the stuff is currently written, and the user quickly changes to another notebook. You can get simply issues, if you don't do it properly.
creation_time: datetime) -> Tuple[int, bytearray]: | ||
""" | ||
Creates a bytearray with encoded content and its creation datetime. | ||
Currently, the content is not being encrypted. It gets appended by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the encryption part. We can't encrypt the password,because we would need another shared another password for that. However, we also shouldn't use the users plain text password. My suggestion is to run a key derivation on the entered password and use the derived key as the password for sqlcipher. This way we only need to share the derived key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This protects the clear text password from reused in other attacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do that, although I don't think it makes a big difference. Why can't the attacker just use the same function we provide, with all its encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A key derivation is not reversible, so they can't compute the original password and if we clean the original password as fast as possible out of the memory
return SharedMemory(name=storage_name, create=True, size=max_size) | ||
|
||
|
||
def write_to_sm(content: str, creation_time: Optional[datetime] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently use strings to pass the password around. This way we have no save way to cleans it the process it self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here some solutions and packages are discussed
https://stackoverflow.com/questions/728164/securely-erasing-password-in-memory-python
https://www.sjoerdlangkemper.nl/2016/06/09/clearing-memory-in-python/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also need to disable core dumps, but I need to first check how and where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to protect not just the secret store password but everything that the password itself is designed to protect - all sensitive data that goes in and out of the secret store - database credentials, AWS credentials, etc. Are we really up for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is only about removing the password as fast as possible from the memory, sqlchipher probably did a key derivation on the password we gave it, so it won't save the original one in memory.
return False, datetime.min, '' | ||
|
||
|
||
def _open_shared_memory(storage_name: str, max_size: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now also remembered the issue with shared memory and docker. All is nice and good, if you assume the default behavior of docker regarding ipc namespace. However, there exist the option --ipc=host, in that case the shared memory of the host is used and not cleared during docker stop. We need at least a warning in the docs and the entry point process needs try to remove it when stopping. With a process this would be given and we only would need to care about clearing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, I had an idea how we don't need to care about this. For reading we can create a session copy of the DB with a session password.
This is a spike. The Ticket #42 will be closed, this PR will stay open for later reference. |
closes #42
It's kinda a proposal for the implementation of temporary storage for a secret store password. New ideas, suggestions, etc, are very welcome.