refresh GCP tokens if <55 mins of life left #72

dekkagaijin · 2018-07-03T02:39:50Z

Partially mitigates #59

Signed-off-by: Jake Sanders jsand@google.com

dekkagaijin · 2018-07-03T02:40:58Z

codecov-io · 2018-07-03T03:38:00Z

Codecov Report

Merging #72 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
+ Coverage   93.51%   93.55%   +0.03%     
==========================================
  Files          11       11              
  Lines         972      977       +5     
==========================================
+ Hits          909      914       +5     
  Misses         63       63

Impacted Files	Coverage Δ
config/kube_config.py	`89.96% <100%> (+0.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78472de...7978ef8. Read the comment docs.

mikedanese · 2018-07-03T15:50:18Z

config/kube_config.py

@@ -32,7 +32,7 @@
 from .config_exception import ConfigException
 from .dateutil import UTC, format_rfc3339, parse_rfc3339

-EXPIRY_SKEW_PREVENTION_DELAY = datetime.timedelta(minutes=5)
+MINIMUM_TOKEN_TIME_REMAINING = datetime.timedelta(minutes=55)


What does the metadata server do? If the metadata server also caches tokens, but refreshes them 10 seconds before expiry, then will this call the metadata server on every request for about 55 minutes?

Also can you explain the bug since I'm a python noob?

accounts.google.com provisions tokens with a max validity duration of 1 hour. I think the GCE metadata server provisions tokens valid for 30 minutes.

i think this would cause constant token refreshing if the token validity life is <55 minutes. also this python client is for generic use, not limited to GCP.

What does the metadata server do? If the metadata server also caches tokens, but refreshes them 10 seconds before expiry, then will this call the metadata server on every request for about 55 minutes?

@mikedanese IIRC the metadata server lazily mints new tokens when they've expired, but I'm definitely not an expert

Also can you explain the bug since I'm a python noob?

Not really a 'bug' per se, it's just that long-running operations can very easily result in 401s/403s due to expired tokens. For example, by default, dockerd only pulls 3 image layers in parallel, meaning that especially large images being downloaded over real-world networks can fail half-way through.

i think this would cause constant token refreshing if the token validity life is <55 minutes. also this python client is for generic use, not limited to GCP.

@yliaog So only apply this minimum freshness to the GCP tokens? Works for me.

I guess you can make it a parameter, instead of a const. so you can set it to the most appropriate value for your use case (for GCP?)

@yliaog It doesn't need to be shared between GCP and other credentials, I don't think, since the 1h lifespan on GCP tokens is unlikely to change. If the existing logic suffices for other credentials, I can limit this change to GCP credentials.

also, it seems like clients of this lib won't necessarily know a-priori which token source will be used: https://github.com/kubernetes-client/python-base/blob/master/config/kube_config.py#L179

Seems like this issue might be sufficiently addressed by limiting the change to the GCP token logic. There might be a separate task of factoring out the Authenticator logic into a plugin interface.

This method won't work well for the GCE compute metadata source where there is an extra layer of caching. This will result in a call to the metadata server per request.

This method won't work well for the GCE compute metadata source where there is an extra layer of caching. This will result in a call to the metadata server per request.

Doesn't seem like it's possible to fully mitigate the issue, then, since AFAIK we can't coerce a refresh. We'd just have to hope that one retry is sufficient to pick up a token with enough lifespan left to complete a request.
From the docs at https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#applications:

The metadata server caches access tokens until they have 60 seconds of remaining time before they expire.

Looks like the only 'good' mitigation would be to reload the kube-config and retry requests on 401/403?

mikedanese · 2018-07-03T15:51:09Z

cc @mbohlool @yliaog

tomplus · 2018-07-03T21:09:43Z

This doesn't solve problem with long-living applications. I suggest to use a thread to refresh the token periodically in the background.

dekkagaijin · 2018-07-03T21:19:53Z

@tomplus this isn't intended to completely solve the problem, just reduce the likelihood that it will be encountered.

I suggest to use a thread to refresh the token periodically in the background.

What would be the periodicity?

Would automatically refreshing & retrying on 401/403 be insufficient?

Signed-off-by: Jake Sanders <jsand@google.com>

roycaihw

Agree with @yliaog. I think this mitigation is specific to GCP currently and it can be a optional parameter to be set to the most appropriate value.

@dekkagaijin Refreshing on api invocation error (401/403) would be better but it requires wrapping the generated api calls. Ref similar problem in OIDC token refresh: kubernetes-client/python#492

roycaihw · 2018-07-10T21:13:57Z

config/kube_config.py

@@ -32,7 +32,8 @@
 from .config_exception import ConfigException
 from .dateutil import UTC, format_rfc3339, parse_rfc3339

-EXPIRY_SKEW_PREVENTION_DELAY = datetime.timedelta(minutes=5)
+EXPIRY_TIME_SKEW = datetime.timedelta(minutes=5)


nit: EXPIRY_SKEW_PREVENTION_DELAY sounds more clear to me in explaining the purpose of the timedelta. Maybe document these constants and update

python-base/config/kube_config_test.py

Line 36 in 78472de

# should be less than kube_config.EXPIRY_SKEW_PREVENTION_DELAY

if we want to change the name.

mikedanese · 2018-10-26T03:27:46Z

Can we change this so that we don't have a threshold, and that we just refresh when the token is expired (or make the skew something like 5 seconds)? That seems better than what we do now and doesn't need to track the metadata server behavior.

ffledgling · 2019-02-17T18:16:58Z

Bump, got bit by this today. Any idea what needs to happen to move the needle on this?

fejta-bot · 2019-05-18T19:11:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-17T19:58:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-07-17T20:45:03Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-17T20:45:12Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 3, 2018

dekkagaijin force-pushed the master branch from 057715a to e32aab8 Compare July 3, 2018 03:29

mikedanese reviewed Jul 3, 2018

View reviewed changes

dekkagaijin force-pushed the master branch from e32aab8 to 86f4688 Compare July 3, 2018 21:15

Refresh GCP tokens if <55 mins of life left

7978ef8

Signed-off-by: Jake Sanders <jsand@google.com>

dekkagaijin force-pushed the master branch from 86f4688 to 7978ef8 Compare July 3, 2018 22:54

roycaihw reviewed Jul 10, 2018

View reviewed changes

roycaihw mentioned this pull request Sep 21, 2018

GCP Token is not refreshed #59

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2019

k8s-ci-robot closed this Jul 17, 2019

refresh GCP tokens if <55 mins of life left #72

refresh GCP tokens if <55 mins of life left #72

Uh oh!

Conversation

dekkagaijin commented Jul 3, 2018

Uh oh!

dekkagaijin commented Jul 3, 2018

Uh oh!

codecov-io commented Jul 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mikedanese Jul 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yliaog Jul 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedanese commented Jul 3, 2018

Uh oh!

tomplus commented Jul 3, 2018

Uh oh!

dekkagaijin commented Jul 3, 2018

Uh oh!

roycaihw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedanese commented Oct 26, 2018

Uh oh!

ffledgling commented Feb 17, 2019

Uh oh!

fejta-bot commented May 18, 2019

Uh oh!

fejta-bot commented Jun 17, 2019

Uh oh!

fejta-bot commented Jul 17, 2019

Uh oh!

k8s-ci-robot commented Jul 17, 2019

Uh oh!

Uh oh!

codecov-io commented Jul 3, 2018 •

edited

Loading

mikedanese Jul 3, 2018 •

edited

Loading

yliaog Jul 3, 2018 •

edited

Loading