Skip to content

Conversation

@morningman
Copy link
Contributor

Fix: #3693

This CL mainly changes:

  1. Add a new BE config max_pushdown_conditions_per_column to limit the number of conditions of a single column that can be pushed down to storage engine.

  2. Add 2 new session variables max_scan_key_num and doris_max_scan_key_num which can set in session level and overwrite the config value in BE.

@morningman morningman added kind/improvement area/sql/execution Issues or PRs related to the execution engine labels May 26, 2020
@morningman morningman self-assigned this May 26, 2020
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (pred->get_child(0)->get_slot_ids(&slot_ids) != 1) {
// not a single column predicate
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

754 line can guarantee it's a single coulumn predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about that, looks like line 754 can guarantee.
But just leave this judgement, because no matter what, we still have to call get_slot_ids() to get the slot id and check if it equals to the slot->id() later.

switch (slot->type().type) {
case TYPE_TINYINT: {
int32_t v = *reinterpret_cast<int8_t*>(value);
range->clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether ”range->clear()“ can be preceded by the switch statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@WingsGo
Copy link
Contributor

WingsGo commented May 27, 2020

Hi, @morningman I have some question about olap_scan_node, could you plz help me figure it out?

  1. what's the relationship between scanner and tablet, one tablet can be scanned by many scanner?they are n:1 or 1:1?
  2. what's the relationship between TPaloScanRange and ColumnValueRange, is ColumnValueRange to deal with predicate case and TPaloScanRange can indicate that the range a scanner need to scan from StorageEngine's prefix index?
  3. What's the usage of function extend_scan_key and the config variable doris_scanner_row_num?
    I would appreciate your reply, thanks~

@morningman
Copy link
Contributor Author

Hi, @morningman I have some question about olap_scan_node, could you plz help me figure it out?

  1. what's the relationship between scanner and tablet, one tablet can be scanned by many scanner?they are n:1 or 1:1?
  2. what's the relationship between TPaloScanRange and ColumnValueRange, is ColumnValueRange to deal with predicate case and TPaloScanRange can indicate that the range a scanner need to scan from StorageEngine's prefix index?
  3. What's the usage of function extend_scan_key and the config variable doris_scanner_row_num?
    I would appreciate your reply, thanks~
  1. scanner and tablet is n:m relation
  2. I am not familiar with TPaloScanRange, need a further look...
  3. extend_scan_key is used to extend the scan keys. scan keys are used to determine the number of scanners.

For example, WHERE a in (1,2,3) and b > 5 and a, b are key columns of the table.

First, for a in (1,2,3), the scan keys become: (a=1),(a=2),(a=3)
Second, for b > 5, the scan keys extends to (a=1, b>5), (a=2, b>5), (a=3, b>5)

  1. doris_scanner_row_num is used for priority scheduling of scanner. Doris has a scanner thread pool that is shared by all scanners. Therefore, a scheduling strategy is needed to prevent a scanner from occupying threads for a long time. doris_scanner_row_num can control the maximum number of rows a scanner can read in one scheduling. The details of the algorithm that are not specific need to be understood by reading the code.

@chaoyli may understand this better. If I said something wrong, he can help correct it.

@WingsGo
Copy link
Contributor

WingsGo commented May 27, 2020

Hi, @morningman I have some question about olap_scan_node, could you plz help me figure it out?

  1. what's the relationship between scanner and tablet, one tablet can be scanned by many scanner?they are n:1 or 1:1?
  2. what's the relationship between TPaloScanRange and ColumnValueRange, is ColumnValueRange to deal with predicate case and TPaloScanRange can indicate that the range a scanner need to scan from StorageEngine's prefix index?
  3. What's the usage of function extend_scan_key and the config variable doris_scanner_row_num?
    I would appreciate your reply, thanks~
  1. scanner and tablet is n:m relation
  2. I am not familiar with TPaloScanRange, need a further look...
  3. extend_scan_key is used to extend the scan keys. scan keys are used to determine the number of scanners.

For example, WHERE a in (1,2,3) and b > 5 and a, b are key columns of the table.

First, for a in (1,2,3), the scan keys become: (a=1),(a=2),(a=3)
Second, for b > 5, the scan keys extends to (a=1, b>5), (a=2, b>5), (a=3, b>5)

  1. doris_scanner_row_num is used for priority scheduling of scanner. Doris has a scanner thread pool that is shared by all scanners. Therefore, a scheduling strategy is needed to prevent a scanner from occupying threads for a long time. doris_scanner_row_num can control the maximum number of rows a scanner can read in one scheduling. The details of the algorithm that are not specific need to be understood by reading the code.

@chaoyli may understand this better. If I said something wrong, he can help correct it.

Thanks for your reply, so if the scan keys extends to (a=1, b>5), (a=2, b>5), (a=3, b>5) , doris will split into 3 scanners to scan RowBatch which satisified the predicate? Do I understand correctly?

@morningman
Copy link
Contributor Author

Thanks for your reply, so if the scan keys extends to (a=1, b>5), (a=2, b>5), (a=3, b>5) , doris will split into 3 scanners to scan RowBatch which satisified the predicate? Do I understand correctly?

Not exactly, extend_scan_key() only extend all possible scan keys. the number of scanners is got from get_hints()

EmmyMiao87
EmmyMiao87 previously approved these changes Jun 3, 2020
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 27046c5 into apache:master Jun 4, 2020
morningman added a commit to morningman/doris that referenced this pull request Jun 4, 2020
…che#3694)

This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.

2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
@EmmyMiao87 EmmyMiao87 mentioned this pull request Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/sql/execution Issues or PRs related to the execution engine kind/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Improve the performance of query with IN predicate

7 participants