Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breaker enhancements #62279

Closed
lanerjo opened this issue Sep 12, 2020 · 3 comments
Closed

Circuit breaker enhancements #62279

lanerjo opened this issue Sep 12, 2020 · 3 comments
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload >enhancement Team:Core/Infra Meta label for core/infra team

Comments

@lanerjo
Copy link

lanerjo commented Sep 12, 2020

Circuit breaker adjustments:

  1. Add additional circuit breakers and clarify existing circuit breakers.
  2. Logs should state the exact circuit breaker that was tripped.
  3. Circuit breakers should not account for recovery of a shard.
  4. More narrowly tailor circuit breakers to allow admins better tuning.
  5. Setting for Search requests that break circuit breakers should be cancelled. (ie. breaker.search.cancel_request: true|false)
  6. Ensure cluster state requests are never touched by circuit breakers
  7. Upon tripping of a circuit breaker, used memory from that request is immediately GC'd (would require the use of explicit gc)
  8. Ability to specify different circuit breakers on a per node, per node role, or all basis
  9. More accurate descriptions of circuit breakers in DOCs

Existing circuit breakers should be more appropriately named. eg.
indices.breaker.total.use_real_memory -> breaker.parent.total.use_real_memory
indices.breaker.total.limit -> breaker.parent.total.limit
indices.breaker.request.limit -> breaker.request.limit (unless this is truly a per index breaker)
indices.breaker.request.overhead -> breaker.request.overhead
indices.breaker.accounting.limit -> breaker.memory.accounting.limit (unless this is per index)
indices.breaker.accounting.overhead -> breaker.memory.accounting.overhead

Should add additional/more narrow circuit breakers for:

  1. Aggregation memory used (This should work better than max_buckets, as this would be more dynamic)
  2. Calculations memory used (ie. unique counts)
  3. Total size of response from search request (should apply at the coordinating node, should also cancel the rest of the request)
  4. Recovery circuit breaker (this should only affect the recovery of shards on a node)
  5. Ingest circuit breaker (ability to limit the incoming size of bulk or write requests separately from other network requests)
  6. Nested aggregations circuit breaker... So instead of allowing a user to nest aggregations 10+ levels, this could be tunable, and shut down before the request ever hits a shard.
@lanerjo lanerjo added >enhancement needs:triage Requires assignment of a team area label labels Sep 12, 2020
@jtibshirani jtibshirani added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Sep 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Circuit Breakers)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Sep 14, 2020
@jtibshirani jtibshirani added Team:Core/Infra Meta label for core/infra team and removed Team:Core/Infra Meta label for core/infra team needs:triage Requires assignment of a team area label labels Sep 14, 2020
@rjernst
Copy link
Member

rjernst commented Sep 15, 2020

@lanerjo While these are certainly things we can consider, having a large list is difficult to action on. Imagine we agreed to item 2, added that logging, but then could not close this issue because there were still several other parts. Having separate issues also allows for clear and focused discussions, rather than several overlapped discussions about different parts of the list. Could you please open separate issues for each enhancement you are describing?

Additionally, much of what is presented here is a solution, but it would help in both our understanding of what needs to be solved, as well as weighing options, if the actual problem were clearly presented with each. For example, again with the logging proposed in option 2, what is the the scenario that additional log information is needed? What log messages do you see now that lack context? What type of debugging of your system are you trying to do that the breaker name is needed?

As this issue is not actionable, I will close this, but please do open separate issues.

@rjernst rjernst closed this as completed Sep 15, 2020
@lanerjo
Copy link
Author

lanerjo commented Sep 16, 2020

@rjernst I have split the more important items (IMO) out into a few enhancement requests.
#62457 #62452 #62449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload >enhancement Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

4 participants