Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retry mechanism when using requests #466

Closed
wants to merge 1 commit into from
Closed

Implement retry mechanism when using requests #466

wants to merge 1 commit into from

Conversation

jonathan-s
Copy link

@jonathan-s jonathan-s commented Aug 20, 2018

This PR implements a retry mechanism if any request would fail when using requests. It closes #464.

More detailed documentation of Retry can be found in the urllib3 documentation

  • Total: The amount of total retries
  • backoff_factor: it will sleep for: {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries
  • status: How many times to retry on bad status codes. The ones on status_forcelist
  • status_forcelist: Bad status codes which it will retry on.

@jonathan-s
Copy link
Author

@brandur-stripe Making a stab at retry issue that I created. Let me know if there is anything else you'd like to see here.

@brandur-stripe
Copy link
Contributor

@jonathan-s Nice — thank you for the contribution! I'm going to have someone else on the team take an initial review pass first, then we'll see what we can do to get this merged in.

@jonathan-s
Copy link
Author

jonathan-s commented Aug 21, 2018 via email

@mickjermsurawong-stripe
Copy link
Contributor

Hi Jonathan! Thank you for your time and this PR :)

To protect against creating unintended duplicate requests given automatic retries, the client should create idempotency key, if not given by the caller, following our docs[0].

The implementation on retry seems great with the standard Retry. It has flexibility to handle the connection/read failures in connect=Integer, read=Integer and special error status code status_forcelist=[] similar to behavior of our implementations in other languages. However, it lacks one detail on adding randomness to the delay to avoid simultaneously backing-off across multiple clients (step-wise jamming), following our implementations in Ruby[1] or Go[2].

My concern of going ahead with using Retry specific to requests client is that it can't support our other implementations in pycurl, urlfetch, urllib.request. Because the retry is tied in request execution method, we might have to implement an equivalent for each of other client at the request level. Ideally, we can move this retry logic up at the parent class. Concretely, we can possibly have request_with_retry at HTTPClient parent class, surrounding request method implemented by each library with our custom retry logic. (We still have to implement retry condition for exceptions thrown from each kind of client)

-cc @brandur-stripe for more thoughts if we should go ahead with Retry given that request's popularity, but we incur more implementation work in other libs (if we want to support retry there).

[0] Idempotency
https://stripe.com/docs/api/ruby#idempotent_requests

[1] Ruby sleep implementation
https://github.com/stripe/stripe-ruby/blob/ec66c3f0f44274f885de8d13de5dce2657932121/lib/stripe/stripe_client.rb#L80

[2] Go sleep implementation
https://github.com/stripe/stripe-go/blob/442690c3856640e4913163f506f0070561f3368d/stripe.go#L511

@jonathan-s
Copy link
Author

jonathan-s commented Aug 22, 2018

@mickjermsurawong-stripe Thanks for the reply. I hear you regarding the idempotency key. Right now it looks like it's an optional requirement which you can supply for each api method. I wonder, is there a case when you would want to NOT send an idempotency key?

If there is no such case, perhaps it would be worth centralizing that at one place and let the client take care of generating an idempotency key?

With regards to avoiding step-wise locking. I think this could be worth to raise an issue upstream in urllib3.

@jonathan-s
Copy link
Author

Also out of curiosity. What % usage is there of the other clients other than requests. Since you have the user-agent it would be possible to find out :)

@mickjermsurawong-stripe
Copy link
Contributor

Hi @johnathan-s,

Idempotency key is not needed for GET and DELETE; by definition the http methods are idempotent. So we shouldn't add it unnecessarily, but doing so will make no difference to results. The recommended way for generation is V4 UUIDs.

If I understand your suggestion about centralization of idempotency setting correctly, yup! we can supply the idempotency key in the header request_headers in file api_requestor where other Stripe-specific headers are dealt with; each underlying http-client doesn't have to take care of it. If the caller passes the key in supplied_header it will override our automatically-generated key. At this layer, we also has access to http method as well.

Hmm I don't think the specific underlying client is exposed outside the HTTPClient class, and therefore it isn't recorded in user-agent. Maybe there are other ways that I'm not aware of. I'll follow up with the team on this, if we should go ahead with requests specific implementation. And I do agree that if we use this client, the added randomness in delay is less of a concern.

Meanwhile, it would be great to have tests on the added features, idempotency key and the retries. One needs to install https://github.com/stripe/stripe-mock to run our pytests.

@mickjermsurawong-stripe
Copy link
Contributor

Hi @jonathan-s
I checked our request logs and the underlying http-client information indeed isn't in the user-agent header in our python implementation. Though I do see http_user_agent fields, but that seems to be from user who don't use our bindings. With that, I can't seem to argue using Retry for requests based on number of users here.

I had a discussion with others, and it seems like the randomness is something we would like to have still. And making PR to urllib3 as you suggest would still run into the same problem of having the change only on people who use requests.

With that, I have a more concrete follow-up on my earlier proposal on how to implement this at the abstract class in the http_client.py. Worth noting here is the should_retry_on_error. The error that client throws up now is wrapped up into Stripe-custom error. What we have to do is to go down to each client level, and assess their error and categorize them into something if we should_retry. This is so that the abstract method above gets that signal, and know how to execute the example logic. requests has a good starting point for us to conceptualize this client-level error from how it tolerates number of connect and read errors.

    # example code, not tested..
    def request_with_retry(self, method, url, headers, post_data=None):

        num_retries = 0
        response = None

        while num_retries <= max_retries:
            try:
                num_retries += 1
                response = self.request(method, url, headers, post_data) # method implemented by each client
                _, status_code, _ = response
                if not self.should_retry_on_codes(status_code):
                    break
            except error:
                if not self.should_retry_on_error(error): # signal from each client's own assessment of the error 
                    raise error
            self.sleep() # custom exponential back-off with randomness

        if response is not None:
            return response
        else:
            msg = "# descriptive errors when retry exceeded "
            raise error.APIConnectionError(msg)

I'm happy to take on this PR as well, if what I communicate isn't very clear or you are not convinced that the effort here is worthwhile. Or please let me know what arrangement works best for you to move this forward!

@mickjermsurawong-stripe
Copy link
Contributor

Hi @jonathan-s ,
I found out that we indeed have the stats, and 99% of our users are using requests -- my apologies. Given so, we do want to roll-out our own retry logic for consistency with our libs in other languages. One simplification is that we can treat error from non-request libs as non-retryable; we maintain the behavior of non-request client. But we still want to put retry logic outside the underlying http-client, and trigger retry logic for requests error. Please let me know if we want to pursue this PR, otherwise I believe it will also be addressed as a part of the refactoring work on this issue #432

@mickjermsurawong-stripe
Copy link
Contributor

[Referring to above to retroactively register response to issue]

@jonathan-s
Copy link
Author

jonathan-s commented Aug 28, 2018 via email

@mickjermsurawong-stripe
Copy link
Contributor

Thanks for the note! Will find some time to get to this :)

@mickjermsurawong-stripe
Copy link
Contributor

@jonathan-s, I'll close this now due to the #470
Thank you for initiating this improvement feature :)

@jonathan-s
Copy link
Author

jonathan-s commented Sep 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry mechanism for failing requests
3 participants