PushingToViewsBlockOutputStream: process blocks concurrently #3208

vavrusa · 2018-09-25T06:04:45Z

The current model is to process blocks for attached views in sequence.
This is not ideal when the processing time for each view varies, or is blocking (for example with replicated tables), as processing of next-in-line view is blocked by wait in it's predecessor.

This commit changes the behavior to process 2 or more attached views concurrently.

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

cc @bocharov @dqminh

bocharov

Great job. Can you make it as a setting and allow user to choose whether they want to process views concurrently or sequentially.

vavrusa · 2018-09-26T05:31:35Z

I could add it as a setting, but I'm not sure what would be a use case for that. Do you have a preference @ztlpn @alexey-milovidov ?

alexey-milovidov · 2018-09-26T19:51:11Z

I could add it as a setting, but I'm not sure what would be a use case for that. Do you have a preference @ztlpn @alexey-milovidov ?

Better to add a setting just to allow switch to previous behaviour in some edge cases.

alexey-milovidov · 2018-09-26T19:52:15Z

dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp

@@ -54,6 +55,9 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
        output = storage->write(query_ptr, context.getSettingsRef());
        replicated_output = dynamic_cast<ReplicatedMergeTreeBlockOutputStream *>(output.get());
    }
+
+    threads.reserve(views.size());
+    exceptions.resize(views.size());


If you use ThreadPool, exceptions will be rethrown automatically.

Makes sense, I'll rework it with ThreadPool, thanks!

alexey-milovidov · 2018-09-26T19:52:50Z

dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp

-            throw;
+        // Process last block without starting a new thread
+        // This also optimizes for the case with a single attached view
+        if (view_num == views.size() - 1) {


utils/check-style/check-style -n

alexey-milovidov · 2018-09-26T19:53:47Z

dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp

+    }
+    catch (Exception & ex)
+    {
+        ex.addMessage("while pushing to view " + view.database + "." + view.table);


backQuoteIfNeed.

alexey-milovidov · 2018-09-26T19:54:26Z

dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp

+    {
+        ex.addMessage("while pushing to view " + view.database + "." + view.table);
+        exceptions[view_num] = std::current_exception();
+    }


This will lead to std::terminate in case of non DB::Exception.
Examples: bad alloc, network error, etc.

The issue will be solved automatically if you use ThreadPool.

alexey-milovidov · 2018-09-26T19:55:34Z

Almost Ok.

The current model is to process blocks for attached views in sequence. This is not ideal when the processing time for each view varies, or is blocking (for example with replicated tables), as processing of next-in-line view is blocked by wait in it's predecessor. This commit changes the behavior to process 2 or more attached views concurrently.

vavrusa · 2018-09-26T22:22:55Z

Updated, it's now behind allow_concurrent_view_processing setting, and disabled by default.

alexey-milovidov · 2018-10-01T01:41:02Z

dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp

    {
-        try
+        // Push to views concurrently if enabled, and more than one view is attached
+        ThreadPool pool(std::min(getNumberOfPhysicalCPUCores(), views.size()));


Why don't use settings.max_threads?

I can change it if you'd prefer that

I've already done it but forgot to commit.

alexey-milovidov · 2018-10-01T01:42:33Z

dbms/src/Interpreters/Settings.h

@@ -291,6 +291,7 @@ struct Settings
    M(SettingUInt64, http_max_multipart_form_data_size, 1024 * 1024 * 1024, "Limit on size of multipart/form-data content. This setting cannot be parsed from URL parameters and should be set in user profile. Note that content is parsed and external tables are created in memory before start of query execution. And this is the only limit that has effect on that stage (limits on max memory usage and max execution time have no effect while reading HTTP form data).") \
    M(SettingBool, calculate_text_stack_trace, 1, "Calculate text stack trace in case of exceptions during query execution. This is the default. It requires symbol lookups that may slow down fuzzing tests when huge amount of wrong queries are executed. In normal cases you should not disable this option.") \
    M(SettingBool, allow_ddl, true, "If it is set to true, then a user is allowed to executed DDL queries.") \
+    M(SettingBool, allow_concurrent_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \


I will remove allow word, because we use "allow" mostly for access control and this setting is not for access control.

alexey-milovidov · 2018-10-01T01:47:39Z

Missing test that shows that it works with both values of setting.
Could you please make a separate PR with this test?

vavrusa · 2018-10-01T16:31:55Z

Will do @alexey-milovidov

bocharov reviewed Sep 25, 2018

View reviewed changes

alexey-milovidov reviewed Sep 26, 2018

View reviewed changes

vavrusa force-pushed the master branch from c4ef661 to a971a0b Compare September 26, 2018 22:22

Update PushingToViewsBlockOutputStream.cpp

c4939a1

alexey-milovidov reviewed Oct 1, 2018

View reviewed changes

alexey-milovidov merged commit a473627 into ClickHouse:master Oct 1, 2018

alexey-milovidov added a commit that referenced this pull request Oct 1, 2018

Changes after merge #3208

698be01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PushingToViewsBlockOutputStream: process blocks concurrently #3208

PushingToViewsBlockOutputStream: process blocks concurrently #3208

vavrusa commented Sep 25, 2018

bocharov left a comment

vavrusa commented Sep 26, 2018

alexey-milovidov commented Sep 26, 2018

alexey-milovidov Sep 26, 2018

vavrusa Sep 26, 2018

alexey-milovidov Sep 26, 2018

alexey-milovidov Sep 26, 2018

alexey-milovidov Sep 26, 2018

alexey-milovidov Sep 26, 2018

alexey-milovidov commented Sep 26, 2018

vavrusa commented Sep 26, 2018

alexey-milovidov Oct 1, 2018

vavrusa Oct 1, 2018

alexey-milovidov Oct 1, 2018

alexey-milovidov Oct 1, 2018

alexey-milovidov commented Oct 1, 2018

vavrusa commented Oct 1, 2018

PushingToViewsBlockOutputStream: process blocks concurrently #3208

PushingToViewsBlockOutputStream: process blocks concurrently #3208

Conversation

vavrusa commented Sep 25, 2018

bocharov left a comment

Choose a reason for hiding this comment

vavrusa commented Sep 26, 2018

alexey-milovidov commented Sep 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexey-milovidov commented Sep 26, 2018

vavrusa commented Sep 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexey-milovidov commented Oct 1, 2018

vavrusa commented Oct 1, 2018