-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement polling tenants concurrently #3647
Conversation
db642c7
to
ee6de7c
Compare
I'm aware of a race in the tests when we check the error message if we exceed the threshold during polling. Since now we poll concurrently, we can't know which error will be the final error, and so an adjustment needs to be made to assert the error is one of n, rather than one specific. I'll follow up. In the meantime, feedback welcome. My thought to follow this up this PR is to modify the tenant loop, so we loop over tenants which are sorted by some metric, like block size, or last poll duration. This way we can start work on the ones that require the most effort first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this lgtm. not approving b/c it's still marked draft
9f3c737
to
e642282
Compare
I've modified the error handling to be consistent, which required keeping of a |
9ad2b78
to
8a99b99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. one thought.
can we add the new config option to the docs?
8a99b99
to
b83b0a2
Compare
cd02cba
to
52fd063
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go!
Here we make changes to the error handling to account for the additional complexity brought with the tenant concurrency. This changes the behavior of the blocklist_poll_tolerate_consecutive_errors configuration by applying to a single tenant, which instructs the poller to retry until the threshold is met. A new configuration parameter blocklist_poll_tolerate_tenant_failures has been added to account for the number of failing tenants that will be tolerated. This allows parts of the old behavior scoped to a single tenant, but also accounts for a more global picture. This means that a single failing tenant by default will not stop the entire polling process. Tests have been updated to account for this additional logic.
abef2bd
to
2daab09
Compare
What this PR does:
Here we implement basic polling for tenants concurrently. This is another step
towards making the polling more efficient. The next step will be to implement
a weighted polling strategy to poll tenants by some priority. For now, I think
this is a reasonable start and should stand on its own.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]