-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: protect datanode with concurrency limit. #4699
base: main
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
ed16bbb
to
724d851
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4699 +/- ##
==========================================
- Coverage 84.79% 84.63% -0.17%
==========================================
Files 1115 1116 +1
Lines 201197 201385 +188
==========================================
- Hits 170612 170443 -169
- Misses 30585 30942 +357 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to update config docs when we finalize the names
2f755af
to
0b101a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
config/config.md
Outdated
@@ -335,6 +335,8 @@ | |||
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. | | |||
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. | | |||
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. | | |||
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. | | |||
| `concurrent_query_limiter_timeout` | String | `25ms` | The timeout of acquiring concurrency permits. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this option? Despite the implementation details (acquire
vs. try_acquire
), it seems like users would hardly adjust this option. And exposing such an option would also limit us from implementing a better behavior in the future, like "only pending 30 awaiting requests in the queue, for at most 100ms".
We can hide such logic as our system's internal details. WDYT @sunng87 @evenyag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree remove this option will give better flexibility for future. I think try_acquire
is a special case that is equivalent to 0 timeout on no permits i think having timeout maybe more flexible in this case. I removed the concurrent_query_limiter_timeout
in DataNode options and hardcoded 25ms wait with a TODO. In future versions i think we could update the max_concurrent_queries
and concurrent_query_limiter_timeout
dynamically via sql such as SET max_concurrent_queries = 8;
303d66f
to
120224b
Compare
src/datanode/src/region_server.rs
Outdated
if let Some(p) = &self.inner.parallelism { | ||
let permit = p.acquire().await?; | ||
let res = self.inner.handle_request(region_id, request).await; | ||
drop(permit); | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this semaphore also applies to RegionPutRequest
whilst the purpose is to limit concurrent queries. We need to distinguish different request types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i put the rate limiting on handle_read
and handle_remote_read
that does datafusion queries? seems like handle_requests
method is more of writes and region ddl changes.
src/datanode/src/datanode.rs
Outdated
@@ -334,6 +334,9 @@ impl DatanodeBuilder { | |||
common_runtime::global_runtime(), | |||
event_listener, | |||
table_provider_factory, | |||
opts.max_concurrent_queries, | |||
//TODO: revaluate the hardcoded timeout on the next version of datanode concurrency limiter. | |||
Duration::from_millis(25), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25ms seems to be too short. Maybe we need to hold those throttled queries instead of failing them very quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can increase the threshold. By hold do you mean to (worst case) allow queries to wait for the semaphore until the request to datanode timeout?
120224b
to
62a99c4
Compare
62a99c4
to
75c340a
Compare
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Adding query parallelism limit in region server to protect datanode from query overload.
Checklist