Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the Buffer/Batch constraints on tower_fallback #1674

Closed
2 tasks
teor2345 opened this issue Feb 3, 2021 · 2 comments
Closed
2 tasks

Document the Buffer/Batch constraints on tower_fallback #1674

teor2345 opened this issue Feb 3, 2021 · 2 comments
Labels
A-docs Area: Documentation A-rust Area: Updates to Rust code C-bug Category: This is a bug

Comments

@teor2345
Copy link
Contributor

teor2345 commented Feb 3, 2021

Is your feature request related to a problem? Please describe.

tower_fallback::service and tower_fallback::future implement some complex poll_ready/call state handling between two services, which is particularly risky if either of those services are Buffered or Batched. This state handling happens across both modules.

The current code appears to be correct, but the Buffer/Batch constraints are undocumented. It also depends on the modules remaining in sync.

Here are the underlying issues:

  • If lots of tasks with data dependencies use a Buffer or Batch with a small bound, Zebra can hang.
  • Buffers or Batches with small bounds can also limit concurrency, particularly if the underlying service does a lot of work in its call.

Describe the solution you'd like

  • Document the internal states of tower_fallback::future
  • Document how tower_fallback avoids these risks with Buffered or Batched services

Describe alternatives you've considered

Do nothing. If callers use tower-fallback incorrectly, they may hang.

Additional context

See #1593 for the overall Buffer and Batch hang issue, analysis, and design patterns.

@teor2345 teor2345 added C-bug Category: This is a bug A-docs Area: Documentation A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage P-Medium labels Feb 3, 2021
@teor2345 teor2345 changed the title Refactor or Document the poll_ready/call constraints on tower_fallback Document the poll_ready/call constraints on tower_fallback Feb 15, 2021
@teor2345
Copy link
Contributor Author

This task has been revised based on the fixes to the underlying Buffer issue, see #1593 for details.

@teor2345 teor2345 changed the title Document the poll_ready/call constraints on tower_fallback Document the Buffer/Batch hang risk on tower_fallback Feb 15, 2021
@teor2345 teor2345 changed the title Document the Buffer/Batch hang risk on tower_fallback Document the Buffer/Batch constraints on tower_fallback Feb 15, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021
@teor2345
Copy link
Contributor Author

The current settings seem fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

No branches or pull requests

2 participants