-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PushingToViewsBlockOutputStream: process blocks concurrently #3208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. Can you make it as a setting and allow user to choose whether they want to process views concurrently or sequentially.
I could add it as a setting, but I'm not sure what would be a use case for that. Do you have a preference @ztlpn @alexey-milovidov ? |
Better to add a setting just to allow switch to previous behaviour in some edge cases. |
@@ -54,6 +55,9 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( | |||
output = storage->write(query_ptr, context.getSettingsRef()); | |||
replicated_output = dynamic_cast<ReplicatedMergeTreeBlockOutputStream *>(output.get()); | |||
} | |||
|
|||
threads.reserve(views.size()); | |||
exceptions.resize(views.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use ThreadPool
, exceptions will be rethrown automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll rework it with ThreadPool, thanks!
throw; | ||
// Process last block without starting a new thread | ||
// This also optimizes for the case with a single attached view | ||
if (view_num == views.size() - 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils/check-style/check-style -n
} | ||
catch (Exception & ex) | ||
{ | ||
ex.addMessage("while pushing to view " + view.database + "." + view.table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backQuoteIfNeed
.
{ | ||
ex.addMessage("while pushing to view " + view.database + "." + view.table); | ||
exceptions[view_num] = std::current_exception(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will lead to std::terminate in case of non DB::Exception.
Examples: bad alloc, network error, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue will be solved automatically if you use ThreadPool
.
Almost Ok. |
The current model is to process blocks for attached views in sequence. This is not ideal when the processing time for each view varies, or is blocking (for example with replicated tables), as processing of next-in-line view is blocked by wait in it's predecessor. This commit changes the behavior to process 2 or more attached views concurrently.
Updated, it's now behind |
{ | ||
try | ||
// Push to views concurrently if enabled, and more than one view is attached | ||
ThreadPool pool(std::min(getNumberOfPhysicalCPUCores(), views.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't use settings.max_threads
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it if you'd prefer that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already done it but forgot to commit.
@@ -291,6 +291,7 @@ struct Settings | |||
M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).") \ | |||
M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.") \ | |||
M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.") \ | |||
M(SettingBool, allow_concurrent_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove allow
word, because we use "allow" mostly for access control and this setting is not for access control.
Missing test that shows that it works with both values of setting. |
Will do @alexey-milovidov |
The current model is to process blocks for attached views in sequence.
This is not ideal when the processing time for each view varies, or is blocking (for example with replicated tables), as processing of next-in-line view is blocked by wait in it's predecessor.
This commit changes the behavior to process 2 or more attached views concurrently.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
cc @bocharov @dqminh