Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databased-backed component args/kwargs #119

Closed
Archmonger opened this issue Jan 10, 2023 · 8 comments · Fixed by #120
Closed

Databased-backed component args/kwargs #119

Archmonger opened this issue Jan 10, 2023 · 8 comments · Fixed by #120

Comments

@Archmonger
Copy link
Contributor

Archmonger commented Jan 10, 2023

Current Situation

Currently, args/kwargs are serialized and stored in client-sided HTML. However, this is susceptible to spoofing, and makes it impossible to pass in non-serializable values such as Python objects.

Proposed Actions

During a HTTP request...

  1. Check if the component's signature contains any args/kwargs
    • Exit here if no args/kwargs exist
  2. Store the args/kwargs into a container (dataclass)
  3. Serialize this container into a byte string using dill.dumps.
  4. Store this byte string in a uniquely identifiable way.
    • This will be locked down to a specific component uuid.

During a WS component load...

  1. Check if the component's signature contains any args/kwargs
    • Render the component without args/kwargs if none exist
  2. Retrieve the bytes string from the database using uuid and/or session identifier.
  3. Deserialize the data back into an object using dill.loads
  4. Render the component using the deserialized args/kwargs

Now this brings up a question about data retention. Ensuring automatic deletion of data is going to be fairly important, and this is a common issue with things such as Django's database-backed sessions model.

Given that, we will need to store a expiration_date value within the ComponentParams database model. In that scenario, we have three options we can pick-and-match from.

Automatic deletion of expired sessions on...

  1. Django Start-up
    • I recommend we implement this regardless
  2. Periodic Schedule (background task)
    • If technologically feasible without external dependencies, we should do this.
    • Django Channels supports background worker processes, but I'll need to dig into whether we can programmatically start it.
    • My guess is this won't be technologically feasible without mandating users to run a separate python manage.py runworker command alongside their webserver, which doesn't feel clean/simplistic.
    • Maybe we can implement this as an "alternative option" for users willing to go through the hassle of manually running a background worker?
  3. WS connection
    • Doing a DB operation on every WS connection would be cumbersome and overkill
    • However, it's possible for us to rate-limit this by storing last-run timestamps within cache, and only re-running if we hit some configured time threshold.
@rmorshea
Copy link
Contributor

Before going with the database-backed approach, I have one more idea. What if users configured a secret in their settings for Django IDOM that we used to symmetrically encrypt the pickled parameters before including them in the rendered templates. This both protects sensitive information and prevents injection attacks without requiring anything to be stored in the database.

@Archmonger
Copy link
Contributor Author

We should develop that as an optional feature in an follow-up PR. To do that properly would require additional dependencies.

@rmorshea
Copy link
Contributor

My impression is that it might take more time to figure out how to properly expire these database entries vs adding a setting for a secret and using it to encrypting the parameters using the cryptography library.

@Archmonger
Copy link
Contributor Author

Archmonger commented Jan 10, 2023

There is added complexity to cryptography though, mostly related to key rotation (handling/upgrading old keys to new keys without breaking). Additionally, a common issue with django DB encryption libraries is determining a way to do it without breaking AlterField migration operations.

@rmorshea
Copy link
Contributor

rmorshea commented Jan 10, 2023

It looks like there's an option for key rotation: https://cryptography.io/en/latest/fernet/#cryptography.fernet.MultiFernet

With that said, I'm realizing that implementing an encryption-based solution in a way that's convenient for the user will involve saving these keys in the database and providing utilities to rotate them. This would allow us to make proper key-management and generation our responsibility rather than the user's and doing that would require some care.

Given this, I'm open to going with whatever approach you think will be easiest to complete the fastest.

@Archmonger
Copy link
Contributor Author

Archmonger commented Jan 10, 2023

There's a lot to discuss about approach/compatibility/algorithms/etc. And it will require a non-trivial amount of code to accomplish within Django.

IMO encryption support deserves it's own issue/PR.

@rmorshea
Copy link
Contributor

rmorshea commented Jan 10, 2023

It seems like to complete this then, you'll go with approaches 1 & 3 for expiration?

@Archmonger
Copy link
Contributor Author

Yeah. Although option 2 is technologically ideal, it does not seem convenient for end-users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants