perf: nexmark q1 #7353

lmatz · 2023-01-12T08:16:57Z

query:

CREATE MATERIALIZED VIEW nexmark_q1
AS
SELECT
    auction,
    bidder,
    0.908 * price as price,
    date_time
FROM bid;

plan:

 StreamMaterialize { columns: [auction, bidder, price, date_time, _row_id(hidden)], pk_columns: [_row_id] }
 └─StreamExchange { dist: HashShard(_row_id) }
   └─StreamProject { exprs: [Field(bid, 0:Int32), Field(bid, 1:Int32), (0.908:Decimal * Field(bid, 2:Int32)), Field(bid, 5:Int32), _row_id] }
     └─StreamFilter { predicate: (event_type = 2:Int32) }
       └─StreamRowIdGen { row_id_index: 4 }
         └─StreamSource { source: "nexmark", columns: ["event_type", "person", "auction", "bid", "_row_id"] }

This one is stateless. Probably not a good one,
because there are literally zero computations and I/O to do......

The text was updated successfully, but these errors were encountered:

fuyufjh · 2023-01-12T10:03:29Z

I deem Q1 will be improved a lot after we refactor the NexMark, which will be done as a by-product of risingwavelabs/rfcs#31

lmatz · 2023-01-12T13:25:37Z

Oh, the evaluation is done by consuming from Kafka and a external data generator generating a lot of data in advance.

fuyufjh · 2023-01-12T16:40:54Z

Another related optimization is to remove StreamExchange { dist: HashShard(_row_id) }. This is because our row_ids are generated randomly but StreamMaterialize requires it must be distributed by HashShard(row_id). We are considering introducing a new special RowID data type and override its Hash function, so that the row_id can be generated exactly to match HashShard(row_id) and avoid the Exchange operator. cc. @st1page @TennyZhuang

lmatz · 2023-01-12T17:01:17Z

@huangjw806 will use black hole sink in the future evaluation.

Therefore, without mv, we will remove the exchange and see the new number.

BugenZhao · 2023-01-13T04:28:33Z

@huangjw806 will use black hole sink in the future evaluation.

Therefore, without mv, we will remove the exchange and see the new number.

The sink is rewritten from the materialize node in the optimizer, so I'm afraid the hash distribution is also followed. 🤔 cc @yuhao-su

yuhao-su · 2023-01-13T04:41:03Z

Therefore, without mv, we will remove the exchange and see the new number.

Should sink be parallelized in any case?

The sink is rewritten from the materialize node in the optimizer

In planner actually.

lmatz · 2023-01-13T04:54:18Z

Should sink be parallelized in any case?

If a parallelism 1 sink cannot hit the max throughput while the downstream system is far from being saturated, then increasing parallelism makes sense I suppose

lmatz · 2023-03-15T17:15:59Z

Query:

CREATE sink nexmark_q1
AS
SELECT
    auction,
    bidder,
    0.908 * price as price,
    date_time
FROM bid with ( connector = 'blackhole', format = 'append_only' );

Plan:

 StreamSink { type: append-only, columns: [auction, bidder, price, date_time] }
 └─StreamProject { exprs: [$expr1, $expr2, $expr3, $expr4] }
   └─StreamProject { exprs: [Field(bid, 0:Int32) as $expr1, Field(bid, 1:Int32) as $expr2, (0.908:Decimal * Field(bid, 2:Int32)) as $expr3, Field(bid, 5:Int32) as $expr4, _row_id] }
     └─StreamFilter { predicate: (event_type = 2:Int32) }
       └─StreamRowIdGen { row_id_index: 4 }
         └─StreamSource { source: "nexmark", columns: ["event_type", "person", "auction", "bid", "_row_id"] }
(6 rows)

two consecutive StreamProjects 🤔

shanicky · 2023-03-20T06:31:03Z

after #8532

MATERIALIZED VIEW

dev=> explain CREATE MATERIALIZED VIEW nexmark_q1
AS
SELECT
    auction,
    bidder,
    0.908 * price as price,
    date_time
FROM bid;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [auction, bidder, price, date_time, _row_id(hidden)], pk_columns: [_row_id], pk_conflict: "no check" }
 └─StreamProject { exprs: [auction, bidder, (0.908:Decimal * price) as $expr1, date_time, _row_id] }
   └─StreamRowIdGen { row_id_index: 7 }
     └─StreamSource { source: "bid", columns: ["auction", "bidder", "price", "channel", "url", "date_time", "extra", "_row_id"] }
(4 rows)

shanicky · 2023-03-20T11:21:55Z

Based on a simple benchmark test, if my testing method is correct, removing Exchange has increased the throughput of Nexmark Q1 by about 20%. However, our RowId generation mechanism only uses the first allocation Vnode, so modifications are needed to achieve more accurate results.

lmatz mentioned this issue Jan 12, 2023

Tracking: Nexmark queries optimization #7289

Open

54 tasks

github-actions bot added this to the release-0.1.16 milestone Jan 12, 2023

lmatz added the type/perf label Jan 12, 2023

lmatz mentioned this issue Jan 13, 2023

remove the unnecessary exchange before blackhole sink #7377

Closed

fuyufjh modified the milestones: release-0.1.16, release-0.1.17 Jan 30, 2023

fuyufjh mentioned this issue Jan 31, 2023

feat: remove the redundant exchange after append-only source executor #7621

Closed

fuyufjh assigned shanicky Feb 6, 2023

shanicky modified the milestones: release-0.1.17, release-0.1.18 Feb 22, 2023

shanicky mentioned this issue Feb 28, 2023

Tracking: remove exchange after append-only source #8225

Closed

5 tasks

lmatz mentioned this issue Mar 16, 2023

remove redundant projects #8577

Closed

fuyufjh closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: nexmark q1 #7353

perf: nexmark q1 #7353

lmatz commented Jan 12, 2023 •

edited

Loading

fuyufjh commented Jan 12, 2023

lmatz commented Jan 12, 2023

fuyufjh commented Jan 12, 2023 •

edited

Loading

lmatz commented Jan 12, 2023

BugenZhao commented Jan 13, 2023

yuhao-su commented Jan 13, 2023

lmatz commented Jan 13, 2023 •

edited

Loading

lmatz commented Mar 15, 2023

shanicky commented Mar 20, 2023

shanicky commented Mar 20, 2023

perf: nexmark q1 #7353

perf: nexmark q1 #7353

Comments

lmatz commented Jan 12, 2023 • edited Loading

fuyufjh commented Jan 12, 2023

lmatz commented Jan 12, 2023

fuyufjh commented Jan 12, 2023 • edited Loading

lmatz commented Jan 12, 2023

BugenZhao commented Jan 13, 2023

yuhao-su commented Jan 13, 2023

lmatz commented Jan 13, 2023 • edited Loading

lmatz commented Mar 15, 2023

shanicky commented Mar 20, 2023

shanicky commented Mar 20, 2023

lmatz commented Jan 12, 2023 •

edited

Loading

fuyufjh commented Jan 12, 2023 •

edited

Loading

lmatz commented Jan 13, 2023 •

edited

Loading