Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oidc auto refresh token during api invocation #492

Closed
mvle opened this issue Mar 21, 2018 · 22 comments
Closed

oidc auto refresh token during api invocation #492

mvle opened this issue Mar 21, 2018 · 22 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mvle
Copy link
Contributor

mvle commented Mar 21, 2018

Refresh the token not only at configuration initialization time but also during invocation of the K8 API calls. The token might expire right before a call thus causing the call to fail with an authentication failure. This will be needed for services that run a long time and rely on being able to reach the K8 Api Server (also short services that happen to be unlucky).

Possibly be done with a check to 'exp' field in the id-token right before a call_api or catch a 401 return status and perform a refresh token operation and update the configuration object with the new id_token and refresh_token values from the OAuth server.

Related to #368 #195, kubernetes-client/python-base#48

@mvle
Copy link
Contributor Author

mvle commented Mar 21, 2018

As mentioned by @roycaihw, swagger-codegen does not support OAuth2 refresh.
What I'm proposing is rather hacky. Anyway to do this so generated code will do the right thing? Wrapper calls?

@flylo
Copy link
Contributor

flylo commented Mar 27, 2018

@yliaog this is an issue for us, as we are running kubernetes pods as batch processes and tracking the state of the pods with a static client. Once the client object is initialized, it does no refresh, and therefore any job that takes longer than 60 minutes will fail. Looking through python-base, it seems like we could make a wrapper class that generates a new client (or refreshes the config) every n minutes, or checks status prior to every call (as @mvle suggested). The best fix would be in swagger-codegen, but a temporary solution would probably be very useful for a lot of people.

@yliaog
Copy link
Contributor

yliaog commented Mar 27, 2018

I agree it's better to have a solution before swagger-codegen fix is in.

I think it's likely more efficient to check the status and do the refresh only when it fails. mind to send a PR?

In the meantime, do you mind to ping the swagger-codegen issue?

@flylo
Copy link
Contributor

flylo commented Mar 28, 2018

@yliaog Even with a try/catch wrapper the token wasn't working, but I've found the root cause of that issue. In debugging, I noticed that there is already logic to handle the case of token expiry, but the method is incorrect.

def _is_expired(expiry):
    return ((parse_rfc3339(expiry) + EXPIRY_SKEW_PREVENTION_DELAY) <=
            datetime.datetime.utcnow().replace(tzinfo=UTC))

With the logic above, we arbitrarily push back expiration time 5 minutes, so there is always a 5 minute interval in which any call to the API will be using an expired token with no refresh, and will therefore throw a 401.

Will submit a PR in a bit with a fix to that flipped sign, and also to add a retry wrapper for that API.

@mvle
Copy link
Contributor Author

mvle commented Mar 30, 2018

@flylo awesome. Can't wait to see the PR and for this feature to be implemented.

I was looking through the code but couldn't figure out where to place the wrapper that would catch all the api calls without it being overwritten by swagger-codegen. Only thing I could think of would require everyone using the library to try/catch in their own code when invoking the api calls.

@flylo
Copy link
Contributor

flylo commented Mar 30, 2018

@mvle @yliaog i submitted this today to unblock the API wrapper.
kubernetes-client/python-base#55

I merged a version of the wrapper in my own repo that has some stupid sleeping to get around the flipped sign bug. The wrapper is dumb simple... This is what it looks like right now:

class RetryingCoreV1Api(CoreV1Api):

    def __init__(self, client_provider,
                 failure_tolerance=None,
                 status_codes_to_catch=None,
                 seconds_before_retry=1):
        self._client_provider = client_provider
        self.num_failures = 0
        self._failure_tolerance = failure_tolerance
        self._status_codes = status_codes_to_catch
        self._seconds_pause_before_retry = seconds_before_retry
        super(RetryingCoreV1Api, self).__init__(self._client_provider())

    def __call__(self, instance_function, *args, **kwargs):
        try:
            return instance_function(*args, **kwargs)
        except ApiException as e:
            logger.info('Caught exception %s in %s ' % (e, instance_function.__name__,))
            self.num_failures += 1
            if self._status_codes is not None:
                if e.status not in self._status_codes:
                    raise e
            if self._failure_tolerance is not None:
                if self.num_failures > self._failure_tolerance:
                    raise e
            super(RetryingCoreV1Api, self).__init__(self._client_provider())
            return self.__call__(instance_function, *args, **kwargs)

and you'd use it like so:

from functools import partial
cfg_file = "<path to config file>"
client_provider = partial(new_client_from_config, cfg_file)
api = RetryingCoreV1Api(client_provider)
api(api.create_namespaced_pod, body=..., namespace=...)

I've tested it out in our internal repo and it works fine, but now I just have to adapt my tests to the python-base repo. My plan was to put it in a new module in config called config.patch, since this is just a temporary workaround to an issue in the swagger-codegen.

Before I go through too much trouble of refactoring my tests and making this backwards compatible, will this actually fix your issues?

@flylo
Copy link
Contributor

flylo commented Mar 30, 2018

also @mvle we'd have to put it in python-base.

@mvle
Copy link
Contributor Author

mvle commented Apr 1, 2018

@flylo this approach does indeed solve the problem. But, if I understand your proposal correctly, this is going to require patching the python-client, why not just create a patch that will directly add the retry logic to the "__call_api__" in ApiClient? That way we won't need create a RetryV1Api for each of the V1Api objects and require users to change their applications to make use of this feature. Otherwise, we would have to document how to use this feature with the new calling convention.

@flylo
Copy link
Contributor

flylo commented Apr 2, 2018

@mvle we can't directly edit the ApiClient because that code is auto-generated by swagger-codegen and it seems like all auth is handled in the manual python-base repo. This is just a temporary approach. It is simply a helper that wraps calls to the api in a try/catch block, and re-initializes the client object every time one of those calls fails. Also, at first glance it seems like we can't override call_api in the ApiClient, because the ApiClient has no knowledge of the authentication mechanism used to generate kube config tokens. Maybe there is an easy solution to this in swagger clients, but I'm pretty new to this.

I suppose we could create another child class for the client that overrides call_api, but this would just be more things to maintain and test, and if this is just a temporary patch that seems not ideal.

@mvle
Copy link
Contributor Author

mvle commented Apr 2, 2018

@flylo i agree that what i'm proposing might be hard to maintain. We'll have to add something into Configuration in python-client to store a callback or some info we can use to decide whether to do a renewal like in auth_settings. But i'm hopeful as I think these generated classes don't change much. The patch can be applied on installation of client library. I don't exactly like what I'm proposing much TBH since it's really hacky but it's most transparent to use.

As to your proposal, is it possible to make a single Retry class that is not inheriting from a specific V1Api class, e.g,. CoreV1Api? This way we end up with just one generic retry class.

@flylo
Copy link
Contributor

flylo commented Apr 2, 2018

@mvle
How about this:

class RetryingApiWrapper(object):

    def __init__(self,
                 client_provider,
                 api_cls,
                 failure_tolerance=None,
                 status_codes_to_catch=None,
                 seconds_before_retry=1):
        self._client_provider = client_provider
        self.num_failures = 0
        self._failure_tolerance = failure_tolerance
        self._status_codes = status_codes_to_catch
        self._seconds_pause_before_retry = seconds_before_retry
        self._api_clazz = api_cls
        self._api = None
        self.__refresh_api()

    def __refresh_api(self):
        self._api = self._api_clazz(api_client=self._client_provider())

    def wrap_api_call(self, unbound_method, *args, **kwargs):
        try:
            partially_applied_method = partial(unbound_method, *args, **kwargs)
            bound_method = MethodType(partially_applied_method, self._api, self._api_clazz)
            return bound_method()
        except ApiException as e:
            logger.info('Caught exception %s in %s ' % (e, unbound_method.__name__,))
            self.num_failures += 1
            if self._status_codes is not None:
                if e.status not in self._status_codes:
                    raise e
            if self._failure_tolerance is not None:
                if self.num_failures > self._failure_tolerance:
                    raise e
            self.__refresh_api()
            return self.wrap_api_call(unbound_method, *args, **kwargs)

The wrapper would get used like so:

client_provider = client.new_client_from_config
api = RetryingApiWrapper(client_provider, CoreV1Api)
api.wrap_api_call(CoreV1Api.create_namespaced_pod, body=pod_payload, namespace=self.namespace)

So, you construct an instance of the wrapper by passing it a client provider and an API class. Then, to make calls, you pass an unbound method reference to wrap_api_call. The wrapper will partially apply it, bind it to a private instance of the API, then attempt to call the bound instance method.

@mvle
Copy link
Contributor Author

mvle commented Apr 3, 2018

@flylo this is awesome! I think this will make it easy to use, maintain and not intrusive.

@yliaog what do you think?

@flylo
Copy link
Contributor

flylo commented Apr 3, 2018

sweet - i'll submit a PR as soon as i get a chance.

@yliaog
Copy link
Contributor

yliaog commented Apr 3, 2018

I like it, looking forward to the PR. Thanks.

@flylo
Copy link
Contributor

flylo commented Apr 4, 2018

@yliaog this pr is blocking the wrapper one so if you could take a look that'd be awesome.
kubernetes-client/python-base#55

@roycaihw
Copy link
Member

@flylo How is the wrapper going? Just want to make sure you get everything you need. Is there anything I can help with?

@flylo
Copy link
Contributor

flylo commented Sep 24, 2018

@roycaihw hey I actually ended up using that snippet above in an internal repo because working through the ITs in this one was a bit more time-consuming than I was hoping at the time. Let me take another look at this over the next few days and make sure that I can submit a fully-tested PR rather than just pasting a snippet.

@hannes-brt
Copy link

Are there still plans to fix this? Or has this issue been fixed out of channel?

Thanks!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants