Add DB encryption for email + OpenAI and Anthropic API keys #274

stephan-buckmaster · 2024-04-16T13:52:01Z

It seems to me the OpenAI and Anthropic API keys should not be stored in plaintext in the database.

This change got a bit bigger than expected. It uses ActiveRecord's application level encryption, and includes a migration that will encrypt existing keys.

Still to address:

Incorporate into Dockerfile
The input form for the API keys should also use HTML input-password

krschacht · 2024-04-17T02:01:58Z

Thanks so much! I didn’t get a chance to look through this closely yet, but this sounds like a good change to make. Thanks for jumping on it. I also have a lot of Issues tee’d up, organized by version number, and I’ve made a few passes to flag “good first issue”. Just wanted to highlight that. But I appreciate you discovering this and taking it on too! I’ll dig in tomorrow.

krschacht · 2024-04-18T02:47:14Z

@stephan-buckmaster I’m ramping up on this PR and trying to understand the new setup instructions. There are a couple things I don’t follow yet. And I’ll add the caveat that I’ve never used activerecord database encryption before but I follow the basics of it.

Why do we need a database.yml.sample. Why not just make any changes that need to be made directly into the app’s database.yml and keep it in the repo? I’m a little hesitant to add it to gitignore.
I think we shouldn’t add db username & password into this and instead we just focus on encrypted columns. Do you think this gets us much else? I think that anyone running the app locally already has to handle any unauthorized access to their machine, which is a bigger issue then us encouraging them to change the auth of their database. And for people running it on Render (or Heroku or most other hosts) the DB will get a password in that environment.
I just read up on credentials:edit and it makes sense, but it sounds like we need to instruct people to run db:encryption:init first in order to have it generate custom keys and then the user runs credentials:edit and copies those values in? I haven’t tried this yet so that may not be quite right, but I just skimmed the rails guide. My instinct is that we don’t add a dummy credentials.yml file to the repo and instead just explain those two steps in the README but I could be missing something.

The one thing I haven’t fully made sense of is that Render is getting a RAILS_MASTER_KEY set in the environment, so I can’t fully remember how that gets set up. We need to be sure that this new setup won’t break that.

krschacht · 2024-04-18T02:47:46Z

This merge conflict should be really easy to resolve, just a single like in the schema

olimart

Looking good. Thanks. @krschacht Need to test Docker deployment for existing instances.

config/application.rb

olimart · 2024-04-18T12:09:01Z

test/models/user_test.rb

+    user = users(:keith)
+    old_openai_key = user.openai_key
+    old_cipher_text = user.ciphertext_for(:openai_key)
+    user.update(openai_key: "new one")


Personal preference, just in case test starts failing we can more easily spot if related to the test itself or its setup.

Suggested change

user.update(openai_key: "new one")

user.update!(openai_key: "new one")

olimart · 2024-04-18T12:09:11Z

test/models/user_test.rb

+    user = users(:keith)
+    old_anthropic_key = user.anthropic_key
+    old_cipher_text = user.ciphertext_for(:anthropic_key)
+    user.update(anthropic_key: "new one")


Suggested change

user.update(anthropic_key: "new one")

user.update!(anthropic_key: "new one")

…crypted_data setting and use update! instead of update in tests

stephan-buckmaster · 2024-04-18T14:01:36Z

@krschacht About database.yml -- It looks like a topic that is not quite settled, which I wasn't aware of. I feel all settings could be put in the ENV, essentially making this file redundant. What if I run my postgres server on a different port, or use a socket? If the file is in git, then I have to fiddle around before pulling updates? Let me know, I can revert to the original way ...

Edit: Or do you mean how the samples (database,yml.sample+credentials.yml.sample) are set up? They are samples, so can create the files as preferred

krschacht · 2024-04-19T03:14:29Z

@stephan-buckmaster My question was about adding the database.yml.sample to the repo. It’s easier if you propose changes to the existing database.yml file and then we can have a discussion about them. With a new file it’s harder to compare changes. I don’t actually think we should add a *.sample file to the repo, we should just make decisions and update the existing *.yml files.

On the question of whether to have a database.yml file or not, let’s keep it since it’s the rails default. But notably the url, port, username, and password can all be defined within the DATABASE_URL. But there are still configuration options which aren’t supported by DATABASE_URL so it’s worth having database.yml for that. But because of DATABASE_URL you shouldn’t need to make changes to the database.yml file in your dev environment.

When you get a chance, let me know about the credentials stuff (my number 3 above).

…nfig/database.yml.sample

…file

stephan-buckmaster · 2024-04-19T13:49:12Z

Made changes as requested. I hope the README changes addresses your Q3 (you are right).

How to inform existing users of the required step to execute rails credentials:edit when they pull in these changes? Start a CHANGELOG?

krschacht · 2024-04-21T21:48:38Z

I’ve pulled this branch locally and been trying it out but I’m confused as to how this will work on Render.

it looks like our render.yml file has a generateKey true which generates the secret_key_base automatically. I checked my render deploy and I see this in my ENV but it sounds like this is only used to encrypt session keys.

But now that we are adding a master key I’m unsure how this works in production. We don’t want to add the master.key to the repo so does Render support auto generating this just like secret key base? But if so, it would clearly be different than what we generate locally when we create our credentials file.

So I’m confused as to how we are going to make this work in production/Render. Do you know? It not can you do some research and propose a plan? I ideally want to keep the one-click Render deploy. If we are not able to make this work with the render.yml file then I’d want to make encryption optional so people can still deploy in one click and then optionally follow a guide to enable encryption in production.

thanks for your help. I don’t want to be a blocker on you making progress on this so I’m trying to share what I understand so far. But I also want to make sure we merge in something that is elegant and doesn’t reduce the ease of deploying this app.

stephan-buckmaster · 2024-04-22T18:16:22Z

I have never used Render. I will have to try it out ...

stephan-buckmaster · 2024-04-23T13:54:49Z

@krschacht No luck with Render so far. When I push the button placed in the README I get to fill out a payment form. I didn't want to go that way. So I followed the "Create Static Site" route that is mentioned in the Render doc (https://docs.render.com/deploy-rails). The render.yaml file is recognized. I enter public under Publish Direcotry, and bin/render-build.sh under Build Command The build fails as it cannot connect to a postgres database.

Is adding a credit card for payment really needed?

olimart · 2024-04-23T14:00:41Z

I also had to enter a credit card because no free plan for Redis.
(or at least the free plan isn't selected with the deployment button)
Render only provides trial for PG but not sure it impacts signing up with a cc.

stephan-buckmaster · 2024-04-24T13:50:52Z

I don't really have a use for the Render service, so I'm reluctant to make payments in order to contribute to the development of this project. The Render documentation does mention support for secret files, at https://docs.render.com/configure-environment-variables#secret-files So it can probably be done. It does seem difficult to communicate with existing users

krschacht · 2024-04-24T14:07:32Z

@stephan-buckmaster You shouldn't have to pay any money. The reason I picked Render as the first host to support is because you can host on there for free. I just corrected the mistake in the render.yml file that was not specifying the free redis plan. I corrected that now in main and I'm just about to test it, but you should be able to update your fork, delete whatever you did in render, and try again. It'll deploy with all free resources.

It may be the case that render still asks you to enter a credit card but it should not bill you because every service in the render.yml file is now pointing to a free version of that service.

But all that said, if you don't want to tackle this aspect of encrypted DB then that's okay. We can close out this PR and wait until someone is ready to work through all the implications of encrypting those columns. There are lots of other ways that I'd still love help improving HostedGPT: allyourbot/hostedgpt/milestone/6

In my mind, the app is getting really good but we're still not even at a solid v1.0 yet. It's coming along!

stephan-buckmaster · 2024-04-24T15:29:43Z

Ok I got your hotfix in my repo. When I "push the button," no payment form appears, so that worked.

An issue is highlighted:

A render.yaml file was found, but there was an issue.
services[0].type changing service type not supported

(I imagine this is the same in this repo, not tested)

krschacht · 2024-04-24T17:02:09Z

I think that's because you didn't delete all the existing render services. Try that. Delete all 3 (or is it 4) services that show up on your render Dashboard and also on blueprints and Env, don't miss those. Then click to create a new app from scratch. Render does not allow *changes* to existing services through the render.yml file.

…

On Wed, Apr 24, 2024 at 10:30 AM Stephan Wehner ***@***.***> wrote: Ok I got your hotfix in my repo. When I "push the button," no payment form appears, so that worked. An issue is highlighted: A render.yaml file was found, but there was an issue. services[0].type changing service type not supported (I imagine this is the same in this repo, not tested) — Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAIR5M6IMPMJSBQLXZVWB3Y67FX3AVCNFSM6AAAAABGJNAHLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGIZDCNZZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stephan-buckmaster · 2024-04-24T17:48:09Z

Yes, that worked, thanks @krschacht . The app deployed on Render from my repo. I managed to sign up. When I entered an open-ai key, it ran into an error "ActiveRecord::Encryption::Errors::Configuration (Missing Active Record encryption credential: active_record_encryption.primary_key" So I will try to figure that out.

… render.yaml

through ENV variables

stephan-buckmaster · 2024-04-24T18:57:58Z

Ok, some progress! The commits of today make it quite transparent. Render is configured to generate encryption keys and those apps will use those, while regular users can use config/credentials.yml.enc, as preferred. I trust the Render service will "generateKeys" only once, and use the original in all future deployments.

krschacht · 2024-04-24T20:07:07Z

@stephan-buckmaster Looking good. Few more thoughts:

(1)
Yes, I trust that Render will generateKeys only once. If that assumption was false then we'd be having all sorts of trouble with the one place we're currently using it (secret_key_base). The bigger thing which I wasn't sure of: does generateKeys generate the right kind of random string? I checked my Render account and my SECRET_KEY_BASE is 44 characters long. When I look at the rails guide documentation the example keys are 32 characters:
https://guides.rubyonrails.org/active_record_encryption.html#setup

But I suspect it'll work, maybe it just looks at the first 32 characters of the random string. We can find out once we do an actual deploy.

(2)
It's a little annoying that running db:encryption:init does not immediately copy those values to credentials. It's not a huge deal but having to manually copy those will be a source of setup errors that people make. What happens if someone fails to do this locally? They'll launch the rails app with various attributes being declared as encrypts but without encryption being properly configured. Will the app still run and simply not encrypt those values, or will it fail with some exception? I'm guessing it will still run because we've set support_unencrypted_data to true. If it does still run, let's assume that some people will miss this step in the README and start up the app. Can you add something to the book-up sequence of this rails app where it outputs "WARNING: Attributes not encrypted please do ...." or something like that? What I mean is: someone will type "bin/dev" to startup the server, and if it just works then let's make sure they at least see a repeat of the instructions every time they start the up the rails app.

(3)
Can you go ahead and encrypt user.email in this PR as well? I notice there are some special instructions for downcasing email columns, which we should do for email since we want that to be case insensitive. I feel like this is also a slightly "sensitive" piece of information so we might as well get it done at the same time we're handling the others.

(Check out the lint error in the failed "check" and also pull in the latest main and the failed test error should go away.)

stephan-buckmaster · 2024-04-25T01:32:02Z

Good point about the secret keys. The value of generateValue is 256-bits base64-encoded, https://docs.render.com/blueprint-spec . So I should convert it.

About your point 2 -- I could abort in the initializer that I added, with an informative message, if there is no configured keys. How does that sound.

Encrypting email sounds good, and straightforward. It may not even need an update to the migration.

stephan-buckmaster · 2024-04-25T18:21:59Z

I realized the email is used when logging in. But querying for the record to validate the password against by the plaintext of the email address wouldn't work, when the encrytped value is stored.

One would store a hash together with the email, and query by that hash.

Can you confirm to go this extra mile @krschacht?

krschacht · 2024-04-25T18:25:00Z

@stephan-buckmaster i don’t quite understand the problem. Are you saying that once a record is encrypted, you can no longer do a find_by() on that record?

stephan-buckmaster · 2024-04-26T01:27:26Z

Yes, sorry, Active Record's encryption support does provide for find_by to work even when the values are encrypted, and the query parameter is not. I should have that ready tomorrow.

stephan-buckmaster · 2024-04-26T17:38:51Z

New commits today. So yes, it becomes transparent. When logging in, the database will execute a search against the encrypted email to find the persons record.

In terms of supporting both, an app-wide configuration option for unencrypted vs. encrypted storage: In a way, someone who wants encryption can merge in these changes. They will likely be good for a while, since there isn't that much to it. So maybe just skip this pull-request, or postpone.

About the encryption keys on Render, the values taken from ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY etc... still have to check the question of length (or Base64 encoding vs. 32 alphanumeric bytes).

krschacht · 2024-04-29T21:59:43Z

@stephan-buckmaster Thanks for taking on the email encryption. I'm eager to get this merged into main as it's always a pain to have long lived branched. I did a full round of thorough testing to help feel confident it's ready to merge. In the process, I streamlined things a bit more. You'll see I pushed a few commits. Things are not working really well in dev. Nice job getting this all setup.

The notable change I made was to make encryption setup happen automatically. Anyone currently using the app can simply pull it down, kill their app, and re-run bin/dev or docker compose up --build and encryption will initialize (if needed) and start up again. I hooked into db:prepare and scripted the credentials stuff. I tried to test it pretty thoroughly both as a brand new setup and an existing user pulling the latest and I believe it's handling things well in all cases.

I also made a change to CI to ensure the DB encryption gets setup properly for the test environment.

I believe all that remains now is confirming this works well on Render.

krschacht · 2024-04-29T22:00:43Z

Oh, shoot, actually it looks like my small change to rubyonrails.yml did not fix the CI test script. I don't know if you can click details to see, but it's reporting an exception:

ActiveSupport::EncryptedFile::InvalidKeyLengthError: Encryption key must be exactly 32 characters.

I don't have time to investigate this right now, but we have two open issues, and maybe they're even related? :) I don't know why the encryption key would be the wrong length in CI.

stephan-buckmaster · 2024-04-30T01:28:23Z

Things are fine on Render where I have an instance running. (https://hostedgpt-bra7.onrender.com/)
@krschacht About the test failures, invoking this task fails for me the same way: rake db:prepare RAILS_ENV=test It wants to write the config file encrypted, Rails.application.credentials.write(config.to_yaml), but there is no RAILS_MASTER_KEY in the environment, nor a file config/master.keyApparently the app hasn't needed to do such things before. So the encryption key is actually nil here.

krschacht · 2024-04-30T02:49:42Z

Oh interesting, i'll take a look at that tomorrow. That sounds like something I broke.

Glad to hear the auto generated values for Render are working! It looks like we're at the home stretch with this one.

stephan-buckmaster · 2024-04-30T03:33:36Z

It may be a possibility to configure env variables related to encryption, for testing in the github system. I don't usually integrate with github, so I'm not sure how this is usually done. For my own invocation of "rake test" there are no issues/challenges.

lib/tasks/prepare.rake

stephan-buckmaster · 2024-04-30T14:28:46Z

@krschacht : I just added a comment for that commit about your change to the test:prepare tasks

krschacht · 2024-04-30T19:18:08Z

Okay I fixed up the encryption key generation stuff and incorporated your feedback. Thanks for that. I also fixed a random docker test issue that I ran into.

I think this thing is good to go, although it's hard to be 100% sure this won't break someone's existing setup.

I'm going to merge into main now and update my existing render app & do a fresh deploy of a new render app. Hopefully both of those work smoothly. 🤞

krschacht · 2024-04-30T20:20:40Z

@stephan-buckmaster I had a couple small issues. First, a new app failed to deploy to Render because it was attempting to write the master.key file. I added an extra check to never do that in production. And then my existing Render deploy would not upgrade. It turns out existing instances no longer update the render.yml file once you've made a single change to your app which diverges you from it (and I upgraded my render DB). This means some people (like me) won't get the new ENV keys. I did a couple hotfixes to main which detect this situation and raise an exception that explains how you can fix things manually. It's not quite as clean as I like, but we don't have a ton of app users yet and this should be an edge case.

Anyway, thanks for sticking with this PR and helping me to get it landed. It's a good addition, and I am glad I've finally learned how activerecord encryption works!

If you're up for taking on another task, check out https://github.com/allyourbot/hostedgpt/milestone/6

Easy tasks are labeled with "good first issue"
My highest priority tasks are labeled with "enhancement". These are requests I've from multiple users

stephan-buckmaster · 2024-05-01T03:26:35Z

Fantastic! You're right, the sooner the better. I'll be around if any users run into trouble. Btw, it would be good to cover this feature in documenting backup requirements. I'll be looking at your list of outstanding tasks.

krschacht · 2024-05-01T03:29:07Z

Say more about documenting backup requirements. I have not considered that. Are there some special things to keep in mind? Oh, maybe you just mean that the keys themselves need to be backed up too, not just the database. Is it that?

stephan-buckmaster · 2024-05-01T03:31:49Z

Yes, so database, plus encryption secrets, plus potentially the master.key file.

How does one backup data stored on Render? How to restore it?

Encrypt the API keys for OpenAI and Anthropic

92729a3

olimart reviewed Apr 18, 2024

View reviewed changes

Review comments: explain config.active_record.encryption.support_unen…

bfd663e

…crypted_data setting and use update! instead of update in tests

Merge latest from upstream, resolve merge conflict in db/schema.rb

c5cc86e

stephan-buckmaster added 3 commits April 19, 2024 06:18

Merge from upstream

df65e2f

Restore original config/database.yml, remove .gitignore entry, and co…

f91da68

…nfig/database.yml.sample

Update README with respect to credentials.yml.enc, remove the sample …

92024cf

…file

Merge branch 'main' into main

01c25cc

Merge branch 'upstream-main' -- (update for Render plan)

7f269e6

stephan-buckmaster added 3 commits April 24, 2024 11:20

Have keys related to active record encryption generated on Render via…

89f80f7

… render.yaml

Have keys related to active record encryption generated on Render via…

51960d0

… render.yaml

Use an config/initializer for configuring active record encryption

746d89f

through ENV variables

stephan-buckmaster added 2 commits April 26, 2024 08:45

Encrypt the email field of Person records also

e26a5b1

Merge from upstream

0349a83

stephan-buckmaster and others added 7 commits April 29, 2024 10:01

Merge upstream

d1bf816

cleanup migration

e5a2af1

ensure encryption init happens with db:prepare

02cbed2

make docker entrypoint more resilient

8490064

Update readme and startup script

5c571b5

fix whitespace

a52a289

update test runner

d99e387

stephan-buckmaster commented Apr 30, 2024

View reviewed changes

lib/tasks/prepare.rake Show resolved Hide resolved

krschacht added 3 commits April 30, 2024 14:00

Fix running tests in docker within dev

776ec26

Add generation of master.key

438bf8c

simplify generation of ar encryption keys

6f2d938

krschacht changed the title ~~Encrypt the API keys for OpenAI and Anthropic~~ Add DB encryption for email + OpenAI and Anthropic API keys Apr 30, 2024

krschacht merged commit 6fadd43 into AllYourBot:main Apr 30, 2024
4 checks passed

	user.update(openai_key: "new one")
	user.update!(openai_key: "new one")

	user.update(anthropic_key: "new one")
	user.update!(anthropic_key: "new one")

Add DB encryption for email + OpenAI and Anthropic API keys #274

Add DB encryption for email + OpenAI and Anthropic API keys #274

Conversation

stephan-buckmaster commented Apr 16, 2024

krschacht commented Apr 17, 2024

krschacht commented Apr 18, 2024

krschacht commented Apr 18, 2024

olimart left a comment

Choose a reason for hiding this comment

olimart Apr 18, 2024

Choose a reason for hiding this comment

olimart Apr 18, 2024

Choose a reason for hiding this comment

stephan-buckmaster commented Apr 18, 2024 • edited Loading

krschacht commented Apr 19, 2024

stephan-buckmaster commented Apr 19, 2024

krschacht commented Apr 21, 2024

stephan-buckmaster commented Apr 22, 2024

stephan-buckmaster commented Apr 23, 2024

olimart commented Apr 23, 2024

stephan-buckmaster commented Apr 24, 2024

krschacht commented Apr 24, 2024

stephan-buckmaster commented Apr 24, 2024

krschacht commented Apr 24, 2024 via email

stephan-buckmaster commented Apr 24, 2024

stephan-buckmaster commented Apr 24, 2024

krschacht commented Apr 24, 2024 • edited Loading

stephan-buckmaster commented Apr 25, 2024

stephan-buckmaster commented Apr 25, 2024

krschacht commented Apr 25, 2024

stephan-buckmaster commented Apr 26, 2024

stephan-buckmaster commented Apr 26, 2024

krschacht commented Apr 29, 2024

krschacht commented Apr 29, 2024

stephan-buckmaster commented Apr 30, 2024 • edited Loading

krschacht commented Apr 30, 2024

stephan-buckmaster commented Apr 30, 2024

stephan-buckmaster commented Apr 30, 2024

krschacht commented Apr 30, 2024

krschacht commented Apr 30, 2024

stephan-buckmaster commented May 1, 2024

krschacht commented May 1, 2024

stephan-buckmaster commented May 1, 2024

stephan-buckmaster commented Apr 18, 2024 •

edited

Loading

krschacht commented Apr 24, 2024 •

edited

Loading

stephan-buckmaster commented Apr 30, 2024 •

edited

Loading