Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to all resources and datasources #286

Merged
merged 11 commits into from
Oct 23, 2024
Merged

Conversation

mitchnielsen
Copy link
Contributor

@mitchnielsen mitchnielsen commented Oct 22, 2024

Uses the hashicorp/retryablehttp package to get an HTTP client with
retries and exponential backoff out of the box. This is implemented
within our client package, meaning calls to the CRUD methods from both
resources and datasources will inherit that functionality.

At a high level, the changes here are:

  • Replacing the http client with a retryablehttp client
  • Reverting implementations of avast/retry in resources
  • Removing avast/retry-related helpers

Related to https://linear.app/prefect/issue/PLA-183/add-retries-to-resources

Adds retries for Create and Read on the Work Pool resource.
@mitchnielsen
Copy link
Contributor Author

It probably makes sense just to implement retries on all resources/datasources at once, rather than incrementally adding them where we get reports of failures - I'm going to see if I can get Cursor to do most of the heavy lifting there, and if we can add a facility to make it easier to add/tweak in the future 🤔

@parkedwards
Copy link
Contributor

@mitchnielsen trying to think, are there calls that we WOULDNT want to retry on?

@mitchnielsen
Copy link
Contributor Author

trying to think, are there calls that we WOULDNT want to retry on?

Asking in Slack 👌🏼

Uses the hashicorp/retryablehttp package to get an HTTP client with
retries and exponential backoff out of the box. This is implemented
within our client package, meaning calls to the CRUD methods from both
resources and datasources will inherit that functionality.
@mitchnielsen mitchnielsen changed the title Add more retries Add retries to all resources and datasources Oct 23, 2024
Blocks, and other resources, need more time to resolve objects so let's
bump up the defaults to accommodate that.
Still saw failures in acceptance tests in CI
Was still seeing errors in the acceptance tests
- Adds a function to define how to determine whether or not to retry,
  extending the default function so we can accept 409s and retry 404s.
  We can adapt this function over time as well.
- Removes the time and retry overrides, as those were not the reason
  that block tests were failing - the requests themselves were passing,
  and didn't need retries - what they needed was to retry 404s and not
  retry 409s.
Uses the stdlib http.Client interface so we don't need to specify the
retryablehttp variant in every client method.
client := &Client{
hc: http.DefaultClient,
hc: httpClient,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change here - we're replacing the default HTTP client with our retry-able HTTP client.

Comment on lines -39 to -58
// MustNew returns a new client or panics if an error occurred.
func MustNew(opts ...Option) *Client {
client, err := New(opts...)
if err != nil {
panic(fmt.Sprintf("error occurred during construction: %s", err))
}

return client
}

// WithClient configures the underlying http.Client used to send
// requests.
func WithClient(httpClient *http.Client) Option {
return func(client *Client) error {
client.hc = httpClient

return nil
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were unused.

Comment on lines +105 to +122

func checkRetryPolicy(ctx context.Context, resp *http.Response, err error) (bool, error) {
// If the response is a 409 (StatusConflict), that means the request
// eventually succeeded and we don't need to make the request again.
if resp.StatusCode == http.StatusConflict {
return false, nil
}

// If the response is a 404 (NotFound), try again. This is particularly
// relevant for block-related objects that are created asynchronously.
if resp.StatusCode == http.StatusNotFound {
return true, err
}

// Fall back to the default retry policy for any other status codes.
//nolint:wrapcheck // we've extended this method, no need to wrap error
return retryablehttp.DefaultRetryPolicy(ctx, resp, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can be specific about what warrants a retry. This is cool because we knew in the past we didn't need to retry 409s, but the retry implementation we had at the time didn't really let us achieve that.

@mitchnielsen mitchnielsen marked this pull request as ready for review October 23, 2024 22:10
@mitchnielsen mitchnielsen requested a review from a team as a code owner October 23, 2024 22:10
// Finally, convert the retryablehttp client to a standard http client.
// This allows us to retain the `http.Client` interface, and avoid specifying
// the `retryablehttp.Client` interface in our client methods.
httpClient := retryableClient.StandardClient()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is pretty sweet. It saves a lot of lines changed - see f86254b.

Copy link
Contributor

@parkedwards parkedwards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@mitchnielsen mitchnielsen merged commit 956d9c6 into main Oct 23, 2024
7 checks passed
@mitchnielsen mitchnielsen deleted the add-more-retries branch October 23, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants