Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push down IN filter #4958

Closed
evenyag opened this issue Nov 7, 2024 · 4 comments · Fixed by #5129
Closed

Push down IN filter #4958

evenyag opened this issue Nov 7, 2024 · 4 comments · Fixed by #5129
Labels
C-enhancement Category Enhancements
Milestone

Comments

@evenyag
Copy link
Contributor

evenyag commented Nov 7, 2024

What type of enhancement is this?

Performance

What does the enhancement do?

In pushdown is not supported

mysql> explain analyze select * from cpu where hostname in ('host_0') limit 1;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stage | node | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|     0 |    0 |  GlobalLimitExec: skip=0, fetch=1 metrics=[output_rows: 1, elapsed_compute: 4379, ]
  CoalescePartitionsExec metrics=[output_rows: 1, elapsed_compute: 32862, ]
    LocalLimitExec: fetch=1 metrics=[output_rows: 1, elapsed_compute: 28376135, ]
      CoalesceBatchesExec: target_batch_size=8192 metrics=[output_rows: 9072, elapsed_compute: 941677, ]
        FilterExec: hostname@0 = host_0 metrics=[output_rows: 9072, elapsed_compute: 172778, ]
          MergeScanExec: peers=[4398046511104(1024, 0), ] metrics=[output_rows: 9072, greptime_exec_read_cost: 0, ready_time: 2929325, first_consume_time: 79523004, finish_time: 208628120, ]
 |
|  NULL | NULL | Total rows: 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.21 sec)

Standalone log

2024-11-07T07:53:03.483645Z DEBUG RegionInvoker::handle_query:handle_read:exec_query_plan:create_physical_plan:handle_query: mito2::read::scan_region: Scan region start 4398046511104(1024, 0), request: ScanRequest { projection: Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), filters: [], output_ordering: None, limit: None, series_row_selector: None }, append_mode: false
2024-11-07T07:53:03.483688Z DEBUG RegionInvoker::handle_query:handle_read:exec_query_plan:create_physical_plan:handle_query: mito2::read::scan_region: Scan region 4398046511104(1024, 0), request: ScanRequest { projection: Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), filters: [], output_ordering: None, limit: None, series_row_selector: None }, time range: GenericRange { start: None, end: None }, memtables: 0, ssts_to_read: 6, append_mode: false

Implementation challenges

No response

@evenyag evenyag added the C-enhancement Category Enhancements label Nov 7, 2024
@evenyag evenyag changed the title Push down In filter Push down IN filter Nov 7, 2024
@fengjiachun fengjiachun added this to the v0.11 milestone Nov 27, 2024
@NiwakaDev
Copy link
Collaborator

I think this feature should be supported if Expr::InList is considered Commutativity::Commutative, not Commutativity::Unimplemented.

Simple Expr::InList such as hostname in ('host_0', ...) is Commutativity::Commutative, I guess.

I'm interested in this issue, but this feature seems to be a high priority(marked as v0.11).
So, if my understanding is beside the point, I don't want to block greptime team's business please feel free to proceed without me!

@Kev1n8
Copy link
Contributor

Kev1n8 commented Dec 8, 2024

I also did some research on this earlier. I tried 2 sqls:

explain analyze select * from numbers where number in (0);
explain analyze select * from numbers where number = 0;

And I found the key difference between these two when optimizing is when it comes to DistPlannerAnalyzer.

// `number in (0)`
2024-12-08T09:14:40.617798Z DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query:exec_plan:exec_query_plan:create_physical_plan: log: type_coercion:
Analyze
  Projection: numbers.number
    Filter: CAST(numbers.number AS Int64) IN ([Int64(0)])
      TableScan: numbers
    
2024-12-08T09:14:40.620572Z DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query:exec_plan:exec_query_plan:create_physical_plan: log: DistPlannerAnalyzer:
Analyze
  Projection: numbers.number
    Filter: CAST(numbers.number AS Int64) IN ([Int64(0)])
      MergeScan [is_placeholder=false]

// `number = 0`
2024-12-08T09:14:09.045934Z DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query:exec_plan:exec_query_plan:create_physical_plan: log: type_coercion:
Analyze
  Projection: numbers.number
    Filter: CAST(numbers.number AS Int64) = Int64(0)
      TableScan: numbers
    
2024-12-08T09:14:09.049799Z DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query:exec_plan:exec_query_plan:create_physical_plan: log: DistPlannerAnalyzer:
Analyze
  MergeScan [is_placeholder=false]

It seems for some reason Filter is not pushed down to the MergeScan for IN. This is my observation. Not sure if it's the right direction. For your reference @NiwakaDev.

@NiwakaDev
Copy link
Collaborator

@Kev1n8

It seems for some reason Filter is not pushed down to the MergeScan for IN. This is my observation. Not sure if it's the right direction. For your reference @NiwakaDev.

I think this is because Expr::InList isn't Commutativity::Commutative, which prevents IN filter from being pushed down.

I tried changing Commutativity::Unimplemented to Commutativity::Commutative, and then result is:

CREATE TABLE monitor (
  host STRING,
  ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() TIME INDEX,
  cpu FLOAT64 DEFAULT 0,
  memory FLOAT64,
  PRIMARY KEY(host));

INSERT INTO monitor
VALUES
    ("127.0.0.1", 1702433141000, 0.5, 0.2),
    ("127.0.0.2", 1702433141000, 0.3, 0.1),
    ("127.0.0.1", 1702433146000, 0.3, 0.2),
    ("127.0.0.2", 1702433146000, 0.2, 0.4),
    ("127.0.0.1", 1702433151000, 0.4, 0.3),
    ("127.0.0.2", 1702433151000, 0.2, 0.4);

explain analyze select host from monitor where host in ("127.0.0.1") limit 1;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stage | node | plan                                                                                                                                                                                                                                                                          |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|     0 |    0 |  MergeScanExec: peers=[4398046511104(1024, 0), ] metrics=[output_rows: 1, greptime_exec_read_cost: 0, finish_time: 13940207, first_consume_time: 13741001, ready_time: 12825959, ]
                                                                                           |
|     1 |    0 |  GlobalLimitExec: skip=0, fetch=1 metrics=[output_rows: 1, elapsed_compute: 67084, ]
  SeqScan: region=4398046511104(1024, 0), partition_count=1 (1 memtable ranges, 0 file 0 ranges) metrics=[output_rows: 3, mem_used: 128, elapsed_await: 452042, elapsed_poll: 190458, ]
 |
|  NULL | NULL | Total rows: 1                                                                                                                                                                                                                                                                 |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In the above plans, IN filter seems to be pushed down.

current behavior

+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stage | node | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|     0 |    0 |  GlobalLimitExec: skip=0, fetch=1 metrics=[output_rows: 1, elapsed_compute: 31166, ]
  CoalescePartitionsExec metrics=[output_rows: 1, elapsed_compute: 256875, ]
    LocalLimitExec: fetch=1 metrics=[output_rows: 1, elapsed_compute: 34716, ]
      CoalesceBatchesExec: target_batch_size=8192 metrics=[output_rows: 3, elapsed_compute: 146962, ]
        FilterExec: host@0 = 127.0.0.1 metrics=[output_rows: 3, elapsed_compute: 685174, ]
          ProjectionExec: expr=[host@0 as host] metrics=[output_rows: 6, elapsed_compute: 61924, ]
            MergeScanExec: peers=[4398046511104(1024, 0), ] metrics=[output_rows: 6, greptime_exec_read_cost: 0, finish_time: 30509708, first_consume_time: 29188460, ready_time: 25429957, ]
 |
|     1 |    0 |  SeqScan: region=4398046511104(1024, 0), partition_count=1 (1 memtable ranges, 0 file 0 ranges) metrics=[output_rows: 6, mem_used: 400, elapsed_await: 2592584, elapsed_poll: 2112208, ]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|  NULL | NULL | Total rows: 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@NiwakaDev
Copy link
Collaborator

I try to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants