Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): parallelize exchange source creation #7125

Merged
merged 3 commits into from
Dec 29, 2022

Conversation

chenzl25
Copy link
Contributor

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Parallelize exchange source creation to speed up sql like select * from v limit 1 if v has so many partitions.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

Copy link
Contributor

@lmatz lmatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chenzl25 chenzl25 changed the title feat(batch): parallelize exchange create source feat(batch): parallelize exchange source creation Dec 29, 2022
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the creation to do_execute, so that the Vec isn't needed 🤔

@chenzl25
Copy link
Contributor Author

Should we move the creation to do_execute, so that the Vec isn't needed 🤔

If your Vec means sources: Vec<ExchangeSourceImpl>, it seems It will introduce another Vec of Vec<ProstExchangeSource>?

@xxchan
Copy link
Member

xxchan commented Dec 29, 2022

I mean eliminating the join on create, if that is expensive. We just select in do_execute. (But I'm not sure whether it's proper and has benefits) 🤔

@chenzl25
Copy link
Contributor Author

I mean eliminating the join on create, if that is expensive. We just select in do_execute. (But I'm not sure whether it's proper and has benefits) 🤔

I think I have got your idea. Let me do some refactor.

@chenzl25 chenzl25 requested a review from xxchan December 29, 2022 13:56
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but how can we test the effect? Under which circumstances will v have many partitions? 🤪

@codecov
Copy link

codecov bot commented Dec 29, 2022

Codecov Report

Merging #7125 (3fe64dc) into main (85aab8c) will decrease coverage by 0.00%.
The diff coverage is 77.41%.

@@            Coverage Diff             @@
##             main    #7125      +/-   ##
==========================================
- Coverage   73.20%   73.20%   -0.01%     
==========================================
  Files        1051     1051              
  Lines      167464   167467       +3     
==========================================
  Hits       122588   122588              
- Misses      44876    44879       +3     
Flag Coverage Δ
rust 73.20% <77.41%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/generic_exchange.rs 45.21% <77.41%> (+5.93%) ⬆️
src/meta/src/hummock/mock_hummock_meta_client.rs 64.73% <0.00%> (-1.06%) ⬇️
src/storage/src/hummock/compactor/sstable_store.rs 74.54% <0.00%> (-0.91%) ⬇️
src/object_store/src/object/mem.rs 86.74% <0.00%> (-0.76%) ⬇️
src/storage/src/hummock/compactor/iterator.rs 98.02% <0.00%> (-0.29%) ⬇️
src/storage/src/hummock/compactor/mod.rs 83.10% <0.00%> (-0.16%) ⬇️
src/meta/src/hummock/manager/mod.rs 79.33% <0.00%> (-0.12%) ⬇️
src/meta/src/manager/cluster.rs 77.11% <0.00%> (+0.24%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@chenzl25
Copy link
Contributor Author

chenzl25 commented Dec 29, 2022

LGTM, but how can we test the effect? Under which circumstances will v have many partitions? 🤪

I have already tested under 3-cn scenarios. One table could have 12 partitions. Previously we will create sources of the exchange operator by RPCs sequentially. This PR can reduce latency for sql like select * from v limit 1 5 times than before.

@mergify mergify bot merged commit b8bdfdf into main Dec 29, 2022
@mergify mergify bot deleted the dylan/parallelize_exchange_create_source branch December 29, 2022 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants