Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload service account keyfile periodically #205

Open
kevincvlam opened this issue Sep 6, 2018 · 20 comments
Open

Reload service account keyfile periodically #205

kevincvlam opened this issue Sep 6, 2018 · 20 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@kevincvlam
Copy link

kevincvlam commented Sep 6, 2018

Hi,

We run the CloudSQL proxy in our kubernetes cluster as a deployment and sometimes we rotate the secret that is used to provide the credentials file for IAM authentication.

As a result the credentials loaded at start-up of the proxy become invalid and the proxy begins printing invalid credentials errors, but does not error out. What's the recommended way to handle this situation? Is there a way to have the proxy reload the credentials?

My understanding is that mounted secrets are updated automatically, so it's up to the application to respond accordingly:

Mounted Secrets are updated automatically When a secret being already consumed in a volume is updated, projected keys are eventually updated as well. The update time depends on the kubelet syncing period.
@kurtisvg kurtisvg added priority: p2 Moderately-important priority. Fix may not be included in next release. Status: Proposal type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Sep 6, 2018
@kurtisvg
Copy link
Contributor

kurtisvg commented Sep 6, 2018

Hey @kevincvlam, thanks for bringing this issue to our attention.

We discussed this issue this morning, and decided that currently the only way to reload the credentials would be to restart the container with the proxy inside. This is obviously not an ideal solution, so we are investigating ways we could handle this. Currently we are looking into the following:

  1. Reload credentials hourly during SSL cert refresh
  2. Attempt credentials reload upon receiving an invalid credentials error
  3. Potentially exit with error code if failing to retrieve valid credentials after X minutes

We'll be using this issue to update our progress on this issue.

@kevincvlam
Copy link
Author

Hey @kurtisvg, thanks for the quick reply, and looking forward to your solution!

Do you have any idea regarding when you expect the issue to be resolved?

@kurtisvg
Copy link
Contributor

Unfortunately, I don't have any promises to make at the moment, just that it's in the queue and the team will get to it when we can. If you have any expertise in this area, we are open to contributions.

@markvincze
Copy link

Hey folks,

This is affecting us as well. The setup we have is that we store the service account key in a Kubernetes secret, which is mounted to the Cloud SQL Proxy sidecar. And we rotate the service account key every day, and replace the content of the secret.

And as far as I understand, when we change the content of the secret, that change is automatically propagated to the mounted file seen by the running proxy container. (probably this is the same setup @kevincvlam described?)

This is not handled by the proxy, so if the mounted key file changes, that's not picked up by the running proxy, right?
Do you have any update on the timeline when this improvement can be expected?

Thanks!

@JorritSalverda
Copy link

In our golang applications we handle the reloading of the key by reinitialising an instance of the class that uses the service account key with code like this:

dnsService := NewGoogleCloudDNSService(*googleCloudDNSProject, *googleCloudDNSZone)

foundation.WatchForFileChanges(os.Getenv("GOOGLE_APPLICATION_CREDENTIALS"), func(event fsnotify.Event) {
	log.Info().Msg("Key file changed, reinitializing dns service...")
	dnsService = NewGoogleCloudDNSService(*googleCloudDNSProject, *googleCloudDNSZone)
})

See https://github.com/estafette/estafette-google-cloud-dns/blob/09eaf7f4123b6c4a012837f2415893219456d137/main.go#L81-L84 and https://github.com/estafette/estafette-foundation/blob/master/foundation.go#L104-L161 for implementation details.

Works like a charm and relies on the github.com/fsnotify/fsnotify libary, which doesn't bring in too many dependencies.

@dhduvall
Copy link

dhduvall commented Nov 8, 2019

I made some changes that address the failures that I've been seeing. It's not comprehensive, and it's pretty hacktastic, but it's survived a day of having Vault rotate the service account keys from underneath it and the new keys mounted into the k8s container. There are three specific points where it can recover: when the credential file is missing or corrupt at startup; at first connection; and failure to rotate the ephemeral cert. I'm sure there are many other places it could fail, but those are the ones I've been running into.

I'm not going to submit a PR in this state, but I figured if anyone else had a need for this, they could take what I have. If it's within shouting distance of being acceptable, though, I can try to polish it up a bit.

@kurtisvg kurtisvg added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 8, 2020
@kurtisvg kurtisvg changed the title CloudSQL Proxy Doesn't Update Credentials Upon Rotation Of Secrets Reload service account keyfile periodically May 14, 2020
@gw0
Copy link

gw0 commented Jun 24, 2021

Missing ability to reload service account keyfile is still an open issue. The only workaround is described in #770 which is basically:

  1. update the keyfile
  2. stop with kill -s SIGTERM "$PPID";
  3. start again with /cloud_sql_proxy ...

@enocom
Copy link
Member

enocom commented Nov 17, 2022

Related to #1045.

@enocom enocom added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 17, 2022
@enocom
Copy link
Member

enocom commented Nov 17, 2022

Bumping the priority given the interest here.

@gdafl
Copy link

gdafl commented Nov 23, 2022

Hi all,

This is also affecting us.

I am running cloud_sql_proxy in a sidecar container in a number of our pods.

As soon as I update our secret, cloud_sql_proxy starts failing because it is still using the old secret that it has in memory.

We cannot resort to SIGHUPping the process as the image is prebuilt and controlled by my organisation (and I cannot modify it), but also this is a workaround rather than a solution.

At the moment I have resorted to deleting all active pods after a key renewal (luckily, we only have to do it once a month) but this is obviously a worse workaround to SIGHUP.

Could a more appropriate solution be provided please?

Many thanks!

@UnsignedLong
Copy link

Hi,

if you are running your workload within GKE you should evaluate "workload identity" as this is the recommended way. With workload identity you don't have to mess around with JSON keys at all.
Nevertheless this issue is still relevant for workloads running outside the Google ecosystem!

@enocom
Copy link
Member

enocom commented Nov 23, 2022

Workload identity does sidestep these problems and is the best solution if you're running in GKE.

Otherwise, we're probably looking at some kind of watcher implementation based on fsnotify. Perhaps this is something people should have to opt-in to as well with a CLI flag.

@gdafl
Copy link

gdafl commented Nov 29, 2022

Hi,

if you are running your workload within GKE you should evaluate "workload identity" as this is the recommended way. With workload identity you don't have to mess around with JSON keys at all. Nevertheless this issue is still relevant for workloads running outside the Google ecosystem!

Just a quick update, I switched to Workload Identities for our GKE cloud-sql-proxy sidecars and it's working perfectly.

A solution to this issue would still be useful for non-GKE based deployments though.

Many thanks again for the suggestion!

@enocom
Copy link
Member

enocom commented Feb 1, 2023

It would be helpful to know how many people want this outside of GKE.

If you're running in GKE, then we strongly recommend using workload identity. Otherwise, this might be useful, but again if the ask here is mostly from GKE workloads, then it's probably not a big priority.

@gdafl
Copy link

gdafl commented Feb 1, 2023

It would be helpful to know how many people want this outside of GKE.

If you're running in GKE, then we strongly recommend using workload identity. Otherwise, this might be useful, but again if the ask here is mostly from GKE workloads, then it's probably not a big priority.

Personally, I switched to workload identities as soon as it was suggested which made this issue moot.

I do still think it's a good feature to add though, to align what cloudsql-proxy does with what GKE does when a secret is updated.

Thanks!

@enocom
Copy link
Member

enocom commented Aug 15, 2023

Given the prevalence of workload identity, we're going to hold off on this feature. If there's interest in the future, please re-open with why it's useful.

@enocom enocom closed this as completed Aug 15, 2023
@UnsignedLong
Copy link

UnsignedLong commented Aug 16, 2023

I have on premise workloads accessing CloudSQL. As workload identity is unavailable in my (and other) environments I still see a huge benefit in this feature.

@enocom
Copy link
Member

enocom commented Aug 16, 2023

Re-opening in that case. What are you using to refresh your credentials file?

@enocom enocom assigned ttosta-google and unassigned enocom Aug 23, 2023
@enocom enocom assigned enocom and unassigned ttosta-google Feb 12, 2024
@enocom enocom added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 12, 2024
@enocom enocom assigned jackwotherspoon and unassigned enocom May 1, 2024
@micahjsmith
Copy link

This is still an issue for me. Running proxy on local developer machines. Devs refresh ADC using gcloud auth login --update-adc periodically (say every 16 hrs). A running proxy process does not pick up the refreshed ADC and must be restarted. This impacts the ability to run long-running scripts from local machines.

@jackwotherspoon
Copy link
Collaborator

@micahjsmith This is still on our radar of todo's but is low priority for us at the moment. But with this comment we will definitely bump it up a bit in our backlog.

Let me try and understand your use-case a bit better. How come you are running the refresh command every 16 hours? Is it to switch the IAM user/service account the Proxy is being run with? (i.e. first starting the Proxy with user@test.com and then running gcloud auth login for user2@test.com?).

If you could provide the reason for refreshing your ADC maybe I can see another option for your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests