Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PushingToViewsBlockOutputStream: process blocks concurrently #3208

Merged
merged 2 commits into from
Oct 1, 2018

Conversation

vavrusa
Copy link
Contributor

@vavrusa vavrusa commented Sep 25, 2018

The current model is to process blocks for attached views in sequence.
This is not ideal when the processing time for each view varies, or is blocking (for example with replicated tables), as processing of next-in-line view is blocked by wait in it's predecessor.

This commit changes the behavior to process 2 or more attached views concurrently.

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

cc @bocharov @dqminh

Copy link
Contributor

@bocharov bocharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. Can you make it as a setting and allow user to choose whether they want to process views concurrently or sequentially.

@vavrusa
Copy link
Contributor Author

vavrusa commented Sep 26, 2018

I could add it as a setting, but I'm not sure what would be a use case for that. Do you have a preference @ztlpn @alexey-milovidov ?

@alexey-milovidov
Copy link
Member

I could add it as a setting, but I'm not sure what would be a use case for that. Do you have a preference @ztlpn @alexey-milovidov ?

Better to add a setting just to allow switch to previous behaviour in some edge cases.

@@ -54,6 +55,9 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
output = storage->write(query_ptr, context.getSettingsRef());
replicated_output = dynamic_cast<ReplicatedMergeTreeBlockOutputStream *>(output.get());
}

threads.reserve(views.size());
exceptions.resize(views.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use ThreadPool, exceptions will be rethrown automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll rework it with ThreadPool, thanks!

throw;
// Process last block without starting a new thread
// This also optimizes for the case with a single attached view
if (view_num == views.size() - 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils/check-style/check-style -n

}
catch (Exception & ex)
{
ex.addMessage("while pushing to view " + view.database + "." + view.table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backQuoteIfNeed.

{
ex.addMessage("while pushing to view " + view.database + "." + view.table);
exceptions[view_num] = std::current_exception();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to std::terminate in case of non DB::Exception.
Examples: bad alloc, network error, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue will be solved automatically if you use ThreadPool.

@alexey-milovidov
Copy link
Member

Almost Ok.

The current model is to process blocks for attached views in sequence.
This is not ideal when the processing time for each view varies, or is
blocking (for example with replicated tables), as processing of next-in-line
view is blocked by wait in it's predecessor.

This commit changes the behavior to process 2 or more attached views concurrently.
@vavrusa
Copy link
Contributor Author

vavrusa commented Sep 26, 2018

Updated, it's now behind allow_concurrent_view_processing setting, and disabled by default.

{
try
// Push to views concurrently if enabled, and more than one view is attached
ThreadPool pool(std::min(getNumberOfPhysicalCPUCores(), views.size()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't use settings.max_threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it if you'd prefer that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already done it but forgot to commit.

@@ -291,6 +291,7 @@ struct Settings
M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).") \
M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.") \
M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.") \
M(SettingBool, allow_concurrent_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove allow word, because we use "allow" mostly for access control and this setting is not for access control.

@alexey-milovidov alexey-milovidov merged commit a473627 into ClickHouse:master Oct 1, 2018
@alexey-milovidov
Copy link
Member

Missing test that shows that it works with both values of setting.
Could you please make a separate PR with this test?

@vavrusa
Copy link
Contributor Author

vavrusa commented Oct 1, 2018

Will do @alexey-milovidov

alexey-milovidov added a commit that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants