Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REAPI remote cache no longer retries store failures #19732

Closed
huonw opened this issue Aug 31, 2023 · 0 comments · Fixed by #19737
Closed

REAPI remote cache no longer retries store failures #19732

huonw opened this issue Aug 31, 2023 · 0 comments · Fixed by #19737
Assignees
Milestone

Comments

@huonw
Copy link
Contributor

huonw commented Aug 31, 2023

Describe the bug

In #19050, I moved code from src/rust/engine/fs/store/src/remote.rs to src/rust/engine/fs/store/src/remote/reapi.rs, but didn't move retry_calls for the stores (previously in store_buffered and store_bytes in remote.rs) into reapi.rs. Oops!

The reapi.rs store methods should call retry_call to retry transient failures too.

Pants version
#19050 landed in 2.18.0.dev0

OS
N/A

Additional info
N/A

@huonw huonw added this to the 2.18.x milestone Aug 31, 2023
@huonw huonw self-assigned this Aug 31, 2023
stuhood pushed a commit that referenced this issue Sep 7, 2023
This fixes #19732 by restoring the retries when storing hits retryable
server failures from the REAPI remote cache server, which were lost in
the #19050 refactoring.

This also explicitly tests for retries, refactoring `StubCAS` to
generalise `read_request_count` to expose the counts of more requests
than just reads, and also consistently return a `Status::internal(...)`
for the simulated errors.

I think #19050 fortunately landed just after 2.17 was cut, so this
regression only affects the 2.18 pre-releases.
WorkerPants pushed a commit that referenced this issue Sep 7, 2023
This fixes #19732 by restoring the retries when storing hits retryable
server failures from the REAPI remote cache server, which were lost in
the #19050 refactoring.

This also explicitly tests for retries, refactoring `StubCAS` to
generalise `read_request_count` to expose the counts of more requests
than just reads, and also consistently return a `Status::internal(...)`
for the simulated errors.

I think #19050 fortunately landed just after 2.17 was cut, so this
regression only affects the 2.18 pre-releases.
huonw added a commit that referenced this issue Sep 7, 2023
…#19737) (#19798)

This fixes #19732 by restoring the retries when storing hits retryable
server failures from the REAPI remote cache server, which were lost in
the #19050 refactoring.

This also explicitly tests for retries, refactoring `StubCAS` to
generalise `read_request_count` to expose the counts of more requests
than just reads, and also consistently return a `Status::internal(...)`
for the simulated errors.

I think #19050 fortunately landed just after 2.17 was cut, so this
regression only affects the 2.18 pre-releases.

Co-authored-by: Huon Wilson <huon@exoflare.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant