Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DB encryption for email + OpenAI and Anthropic API keys #274

Merged
merged 24 commits into from
Apr 30, 2024

Conversation

stephan-buckmaster
Copy link
Contributor

It seems to me the OpenAI and Anthropic API keys should not be stored in plaintext in the database.

This change got a bit bigger than expected. It uses ActiveRecord's application level encryption, and includes a migration that will encrypt existing keys.

Still to address:

  • Incorporate into Dockerfile
  • The input form for the API keys should also use HTML input-password

@krschacht
Copy link
Contributor

Thanks so much! I didn’t get a chance to look through this closely yet, but this sounds like a good change to make. Thanks for jumping on it. I also have a lot of Issues tee’d up, organized by version number, and I’ve made a few passes to flag “good first issue”. Just wanted to highlight that. But I appreciate you discovering this and taking it on too! I’ll dig in tomorrow.

@krschacht
Copy link
Contributor

@stephan-buckmaster I’m ramping up on this PR and trying to understand the new setup instructions. There are a couple things I don’t follow yet. And I’ll add the caveat that I’ve never used activerecord database encryption before but I follow the basics of it.

  1. Why do we need a database.yml.sample. Why not just make any changes that need to be made directly into the app’s database.yml and keep it in the repo? I’m a little hesitant to add it to gitignore.

  2. I think we shouldn’t add db username & password into this and instead we just focus on encrypted columns. Do you think this gets us much else? I think that anyone running the app locally already has to handle any unauthorized access to their machine, which is a bigger issue then us encouraging them to change the auth of their database. And for people running it on Render (or Heroku or most other hosts) the DB will get a password in that environment.

  3. I just read up on credentials:edit and it makes sense, but it sounds like we need to instruct people to run db:encryption:init first in order to have it generate custom keys and then the user runs credentials:edit and copies those values in? I haven’t tried this yet so that may not be quite right, but I just skimmed the rails guide. My instinct is that we don’t add a dummy credentials.yml file to the repo and instead just explain those two steps in the README but I could be missing something.

The one thing I haven’t fully made sense of is that Render is getting a RAILS_MASTER_KEY set in the environment, so I can’t fully remember how that gets set up. We need to be sure that this new setup won’t break that.

@krschacht
Copy link
Contributor

This merge conflict should be really easy to resolve, just a single like in the schema

Copy link
Contributor

@olimart olimart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks. @krschacht Need to test Docker deployment for existing instances.

config/application.rb Show resolved Hide resolved
user = users(:keith)
old_openai_key = user.openai_key
old_cipher_text = user.ciphertext_for(:openai_key)
user.update(openai_key: "new one")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, just in case test starts failing we can more easily spot if related to the test itself or its setup.

Suggested change
user.update(openai_key: "new one")
user.update!(openai_key: "new one")

user = users(:keith)
old_anthropic_key = user.anthropic_key
old_cipher_text = user.ciphertext_for(:anthropic_key)
user.update(anthropic_key: "new one")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user.update(anthropic_key: "new one")
user.update!(anthropic_key: "new one")

…crypted_data setting and use update! instead of update in tests
@stephan-buckmaster
Copy link
Contributor Author

stephan-buckmaster commented Apr 18, 2024

@krschacht About database.yml -- It looks like a topic that is not quite settled, which I wasn't aware of. I feel all settings could be put in the ENV, essentially making this file redundant. What if I run my postgres server on a different port, or use a socket? If the file is in git, then I have to fiddle around before pulling updates? Let me know, I can revert to the original way ...

Edit: Or do you mean how the samples (database,yml.sample+credentials.yml.sample) are set up? They are samples, so can create the files as preferred

@krschacht
Copy link
Contributor

@stephan-buckmaster My question was about adding the database.yml.sample to the repo. It’s easier if you propose changes to the existing database.yml file and then we can have a discussion about them. With a new file it’s harder to compare changes. I don’t actually think we should add a *.sample file to the repo, we should just make decisions and update the existing *.yml files.

On the question of whether to have a database.yml file or not, let’s keep it since it’s the rails default. But notably the url, port, username, and password can all be defined within the DATABASE_URL. But there are still configuration options which aren’t supported by DATABASE_URL so it’s worth having database.yml for that. But because of DATABASE_URL you shouldn’t need to make changes to the database.yml file in your dev environment.

When you get a chance, let me know about the credentials stuff (my number 3 above).

@stephan-buckmaster
Copy link
Contributor Author

Made changes as requested. I hope the README changes addresses your Q3 (you are right).

How to inform existing users of the required step to execute rails credentials:edit when they pull in these changes? Start a CHANGELOG?

@krschacht
Copy link
Contributor

I’ve pulled this branch locally and been trying it out but I’m confused as to how this will work on Render.

it looks like our render.yml file has a generateKey true which generates the secret_key_base automatically. I checked my render deploy and I see this in my ENV but it sounds like this is only used to encrypt session keys.

But now that we are adding a master key I’m unsure how this works in production. We don’t want to add the master.key to the repo so does Render support auto generating this just like secret key base? But if so, it would clearly be different than what we generate locally when we create our credentials file.

So I’m confused as to how we are going to make this work in production/Render. Do you know? It not can you do some research and propose a plan? I ideally want to keep the one-click Render deploy. If we are not able to make this work with the render.yml file then I’d want to make encryption optional so people can still deploy in one click and then optionally follow a guide to enable encryption in production.

thanks for your help. I don’t want to be a blocker on you making progress on this so I’m trying to share what I understand so far. But I also want to make sure we merge in something that is elegant and doesn’t reduce the ease of deploying this app.

@stephan-buckmaster
Copy link
Contributor Author

I have never used Render. I will have to try it out ...

@stephan-buckmaster
Copy link
Contributor Author

@krschacht No luck with Render so far. When I push the button placed in the README I get to fill out a payment form. I didn't want to go that way. So I followed the "Create Static Site" route that is mentioned in the Render doc (https://docs.render.com/deploy-rails). The render.yaml file is recognized. I enter public under Publish Direcotry, and bin/render-build.sh under Build Command The build fails as it cannot connect to a postgres database.

Is adding a credit card for payment really needed?

@olimart
Copy link
Contributor

olimart commented Apr 23, 2024

I also had to enter a credit card because no free plan for Redis.
(or at least the free plan isn't selected with the deployment button)
Render only provides trial for PG but not sure it impacts signing up with a cc.

@stephan-buckmaster
Copy link
Contributor Author

I don't really have a use for the Render service, so I'm reluctant to make payments in order to contribute to the development of this project. The Render documentation does mention support for secret files, at https://docs.render.com/configure-environment-variables#secret-files So it can probably be done. It does seem difficult to communicate with existing users

@krschacht
Copy link
Contributor

@stephan-buckmaster You shouldn't have to pay any money. The reason I picked Render as the first host to support is because you can host on there for free. I just corrected the mistake in the render.yml file that was not specifying the free redis plan. I corrected that now in main and I'm just about to test it, but you should be able to update your fork, delete whatever you did in render, and try again. It'll deploy with all free resources.

It may be the case that render still asks you to enter a credit card but it should not bill you because every service in the render.yml file is now pointing to a free version of that service.

But all that said, if you don't want to tackle this aspect of encrypted DB then that's okay. We can close out this PR and wait until someone is ready to work through all the implications of encrypting those columns. There are lots of other ways that I'd still love help improving HostedGPT: allyourbot/hostedgpt/milestone/6

In my mind, the app is getting really good but we're still not even at a solid v1.0 yet. It's coming along!

@stephan-buckmaster
Copy link
Contributor Author

Ok I got your hotfix in my repo. When I "push the button," no payment form appears, so that worked.

An issue is highlighted:

A render.yaml file was found, but there was an issue.
services[0].type changing service type not supported

(I imagine this is the same in this repo, not tested)

@krschacht
Copy link
Contributor

krschacht commented Apr 24, 2024 via email

@stephan-buckmaster
Copy link
Contributor Author

Yes, that worked, thanks @krschacht . The app deployed on Render from my repo. I managed to sign up. When I entered an open-ai key, it ran into an error "ActiveRecord::Encryption::Errors::Configuration (Missing Active Record encryption credential: active_record_encryption.primary_key" So I will try to figure that out.

@stephan-buckmaster
Copy link
Contributor Author

Ok, some progress! The commits of today make it quite transparent. Render is configured to generate encryption keys and those apps will use those, while regular users can use config/credentials.yml.enc, as preferred. I trust the Render service will "generateKeys" only once, and use the original in all future deployments.

@krschacht
Copy link
Contributor

krschacht commented Apr 24, 2024

@stephan-buckmaster Looking good. Few more thoughts:

(1)
Yes, I trust that Render will generateKeys only once. If that assumption was false then we'd be having all sorts of trouble with the one place we're currently using it (secret_key_base). The bigger thing which I wasn't sure of: does generateKeys generate the right kind of random string? I checked my Render account and my SECRET_KEY_BASE is 44 characters long. When I look at the rails guide documentation the example keys are 32 characters:
https://guides.rubyonrails.org/active_record_encryption.html#setup

But I suspect it'll work, maybe it just looks at the first 32 characters of the random string. We can find out once we do an actual deploy.

(2)
It's a little annoying that running db:encryption:init does not immediately copy those values to credentials. It's not a huge deal but having to manually copy those will be a source of setup errors that people make. What happens if someone fails to do this locally? They'll launch the rails app with various attributes being declared as encrypts but without encryption being properly configured. Will the app still run and simply not encrypt those values, or will it fail with some exception? I'm guessing it will still run because we've set support_unencrypted_data to true. If it does still run, let's assume that some people will miss this step in the README and start up the app. Can you add something to the book-up sequence of this rails app where it outputs "WARNING: Attributes not encrypted please do ...." or something like that? What I mean is: someone will type "bin/dev" to startup the server, and if it just works then let's make sure they at least see a repeat of the instructions every time they start the up the rails app.

(3)
Can you go ahead and encrypt user.email in this PR as well? I notice there are some special instructions for downcasing email columns, which we should do for email since we want that to be case insensitive. I feel like this is also a slightly "sensitive" piece of information so we might as well get it done at the same time we're handling the others.

(Check out the lint error in the failed "check" and also pull in the latest main and the failed test error should go away.)

@stephan-buckmaster
Copy link
Contributor Author

Good point about the secret keys. The value of generateValue is 256-bits base64-encoded, https://docs.render.com/blueprint-spec . So I should convert it.

About your point 2 -- I could abort in the initializer that I added, with an informative message, if there is no configured keys. How does that sound.

Encrypting email sounds good, and straightforward. It may not even need an update to the migration.

@stephan-buckmaster
Copy link
Contributor Author

I realized the email is used when logging in. But querying for the record to validate the password against by the plaintext of the email address wouldn't work, when the encrytped value is stored.

One would store a hash together with the email, and query by that hash.

Can you confirm to go this extra mile @krschacht?

@krschacht
Copy link
Contributor

@stephan-buckmaster i don’t quite understand the problem. Are you saying that once a record is encrypted, you can no longer do a find_by() on that record?

@stephan-buckmaster
Copy link
Contributor Author

Yes, sorry, Active Record's encryption support does provide for find_by to work even when the values are encrypted, and the query parameter is not. I should have that ready tomorrow.

@stephan-buckmaster
Copy link
Contributor Author

New commits today. So yes, it becomes transparent. When logging in, the database will execute a search against the encrypted email to find the persons record.

In terms of supporting both, an app-wide configuration option for unencrypted vs. encrypted storage: In a way, someone who wants encryption can merge in these changes. They will likely be good for a while, since there isn't that much to it. So maybe just skip this pull-request, or postpone.

About the encryption keys on Render, the values taken from ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY etc... still have to check the question of length (or Base64 encoding vs. 32 alphanumeric bytes).

@krschacht
Copy link
Contributor

@stephan-buckmaster Thanks for taking on the email encryption. I'm eager to get this merged into main as it's always a pain to have long lived branched. I did a full round of thorough testing to help feel confident it's ready to merge. In the process, I streamlined things a bit more. You'll see I pushed a few commits. Things are not working really well in dev. Nice job getting this all setup.

The notable change I made was to make encryption setup happen automatically. Anyone currently using the app can simply pull it down, kill their app, and re-run bin/dev or docker compose up --build and encryption will initialize (if needed) and start up again. I hooked into db:prepare and scripted the credentials stuff. I tried to test it pretty thoroughly both as a brand new setup and an existing user pulling the latest and I believe it's handling things well in all cases.

I also made a change to CI to ensure the DB encryption gets setup properly for the test environment.

I believe all that remains now is confirming this works well on Render.

@krschacht
Copy link
Contributor

Oh, shoot, actually it looks like my small change to rubyonrails.yml did not fix the CI test script. I don't know if you can click details to see, but it's reporting an exception:

ActiveSupport::EncryptedFile::InvalidKeyLengthError: Encryption key must be exactly 32 characters.

I don't have time to investigate this right now, but we have two open issues, and maybe they're even related? :) I don't know why the encryption key would be the wrong length in CI.

@stephan-buckmaster
Copy link
Contributor Author

stephan-buckmaster commented Apr 30, 2024

Things are fine on Render where I have an instance running. (https://hostedgpt-bra7.onrender.com/)
@krschacht About the test failures, invoking this task fails for me the same way: rake db:prepare RAILS_ENV=test It wants to write the config file encrypted, Rails.application.credentials.write(config.to_yaml), but there is no RAILS_MASTER_KEY in the environment, nor a file config/master.keyApparently the app hasn't needed to do such things before. So the encryption key is actually nil here.

@krschacht
Copy link
Contributor

Oh interesting, i'll take a look at that tomorrow. That sounds like something I broke.

Glad to hear the auto generated values for Render are working! It looks like we're at the home stretch with this one.

@stephan-buckmaster
Copy link
Contributor Author

It may be a possibility to configure env variables related to encryption, for testing in the github system. I don't usually integrate with github, so I'm not sure how this is usually done. For my own invocation of "rake test" there are no issues/challenges.

@stephan-buckmaster
Copy link
Contributor Author

@krschacht : I just added a comment for that commit about your change to the test:prepare tasks

@krschacht
Copy link
Contributor

Okay I fixed up the encryption key generation stuff and incorporated your feedback. Thanks for that. I also fixed a random docker test issue that I ran into.

I think this thing is good to go, although it's hard to be 100% sure this won't break someone's existing setup.

I'm going to merge into main now and update my existing render app & do a fresh deploy of a new render app. Hopefully both of those work smoothly. 🤞

@krschacht krschacht changed the title Encrypt the API keys for OpenAI and Anthropic Add DB encryption for email + OpenAI and Anthropic API keys Apr 30, 2024
@krschacht krschacht merged commit 6fadd43 into AllYourBot:main Apr 30, 2024
4 checks passed
@krschacht
Copy link
Contributor

@stephan-buckmaster I had a couple small issues. First, a new app failed to deploy to Render because it was attempting to write the master.key file. I added an extra check to never do that in production. And then my existing Render deploy would not upgrade. It turns out existing instances no longer update the render.yml file once you've made a single change to your app which diverges you from it (and I upgraded my render DB). This means some people (like me) won't get the new ENV keys. I did a couple hotfixes to main which detect this situation and raise an exception that explains how you can fix things manually. It's not quite as clean as I like, but we don't have a ton of app users yet and this should be an edge case.

Anyway, thanks for sticking with this PR and helping me to get it landed. It's a good addition, and I am glad I've finally learned how activerecord encryption works!

If you're up for taking on another task, check out https://github.com/allyourbot/hostedgpt/milestone/6

  • Easy tasks are labeled with "good first issue"
  • My highest priority tasks are labeled with "enhancement". These are requests I've from multiple users

@stephan-buckmaster
Copy link
Contributor Author

Fantastic! You're right, the sooner the better. I'll be around if any users run into trouble. Btw, it would be good to cover this feature in documenting backup requirements. I'll be looking at your list of outstanding tasks.

@krschacht
Copy link
Contributor

Say more about documenting backup requirements. I have not considered that. Are there some special things to keep in mind? Oh, maybe you just mean that the keys themselves need to be backed up too, not just the database. Is it that?

@stephan-buckmaster
Copy link
Contributor Author

Yes, so database, plus encryption secrets, plus potentially the master.key file.

How does one backup data stored on Render? How to restore it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants