Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline: optimize part scan #7723

Merged
merged 10 commits into from
Jul 7, 2023
Merged

Conversation

SeaRise
Copy link
Contributor

@SeaRise SeaRise commented Jun 30, 2023

What problem does this PR solve?

Issue Number: ref #6518 ref #4474

Problem Summary:

What is changed and how it works?

  • Avoid excessive memory usage due to too many partitions read concurrently
  • Under the storage thread pool + pipeline model, data imbalance between multiple partitions will not cause data skew in the computing layer, so here is simply concat the streams of different partitions.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

create table part_test.no_part_table(a int, b int);

create table part_test.part_table(a int, b int)
PARTITION BY RANGE (a) (
    PARTITION p0 VALUES LESS THAN (1),
    PARTITION p1 VALUES LESS THAN (2),
    ...
    PARTITION p59 VALUES LESS THAN (59),
    PARTITION p60 VALUES LESS THAN (MAXVALUE)
);

ALTER TABLE part_test.no_part_table SET TIFLASH REPLICA 1;
ALTER TABLE part_test.part_table SET TIFLASH REPLICA 1;

60 partition, 50w per partition.

set tidb_enable_tiflash_pipeline_model = ON;
select count(1) from no_part_table;
+----------+
| count(1) |
+----------+
| 30000000 |
+----------+
1 row in set (0.08 sec)
select count(1) from no_part_table where b = 0 or b = 5;
+----------+
| count(1) |
+----------+
|   200000 |
+----------+
1 row in set (0.02 sec)
select count(1) from part_table;
+----------+
| count(1) |
+----------+
| 30000000 |
+----------+
1 row in set (0.43 sec)
select count(1) from part_table where b = 0 or b = 5;
+----------+
| count(1) |
+----------+
|   200000 |
+----------+
1 row in set (0.08 sec)

set tidb_enable_tiflash_pipeline_model = OFF;
select count(1) from no_part_table;
+----------+
| count(1) |
+----------+
| 30000000 |
+----------+
1 row in set (0.07 sec)
select count(1) from no_part_table where b = 0 or b = 5;
+----------+
| count(1) |
+----------+
|   200000 |
+----------+
1 row in set (0.01 sec)
select count(1) from part_table;
+----------+
| count(1) |
+----------+
| 30000000 |
+----------+
1 row in set (0.45 sec)
select count(1) from part_table where b = 0 or b = 5;
+----------+
| count(1) |
+----------+
|   200000 |
+----------+
1 row in set (0.09 sec)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 30, 2023
@SeaRise SeaRise mentioned this pull request Jun 30, 2023
25 tasks
@SeaRise
Copy link
Contributor Author

SeaRise commented Jun 30, 2023

/run-all-tests

@SeaRise SeaRise changed the title WIP: Pipeline: optimize part scan Pipeline: optimize part scan Jun 30, 2023
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented Jul 2, 2023

/run-all-tests

@SeaRise
Copy link
Contributor Author

SeaRise commented Jul 2, 2023

/run-unit-test

@SeaRise
Copy link
Contributor Author

SeaRise commented Jul 2, 2023

/run-integration-test

OperatorStatus readImpl(Block & block) override
{
if unlikely (done)
return OperatorStatus::HAS_OUTPUT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return OperatorStatus::FINISHED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of

// read will inplace the block when return status is HAS_OUTPUT;
// Even after source has finished, source op still needs to return an empty block and HAS_OUTPUT,
// because there are many operators that need an empty block as input, such as JoinProbe and WindowFunction.
OperatorStatus read(Block & block);

Copy link
Contributor

@xzhangxian1008 xzhangxian1008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 4, 2023
@SeaRise SeaRise requested a review from windtalker July 6, 2023 07:07
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Jul 7, 2023
@ti-chi-bot ti-chi-bot bot added approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 7, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 7, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-07-04 01:17:18.060430939 +0000 UTC m=+80269.994064362: ☑️ agreed by xzhangxian1008.
  • 2023-07-07 03:33:56.166089905 +0000 UTC m=+43727.936428618: ☑️ agreed by windtalker.

@SeaRise
Copy link
Contributor Author

SeaRise commented Jul 7, 2023

/approve

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 7, 2023

@SeaRise: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

  • Currently the CI jobs are not prow job style, we need it to notify Jenkins server to run them.
  • If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SeaRise, windtalker, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 7, 2023

@SeaRise: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

trigger some heavy tests which will not run always when PR updated.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit e98f5d6 into pingcap:master Jul 7, 2023
@SeaRise SeaRise deleted the part_optimisze branch July 7, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants