Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): support streaming_parallelism session variable. #7370

Merged
merged 13 commits into from
Jan 17, 2023

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jan 13, 2023

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

The streaming_parallelism session var takes effect for

  • DdlServiceImpl::create_materialized_view
  • DdlServiceImpl::create_table
  • DdlServiceImpl::create_index
  • DdlServiceImpl::create_sink

TBD:

  • Any session var name better than streaming_parallelism?

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
    - [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • SQL commands, functions, and operators

Release note

Support streaming_parallelism session variable. It's not set by default, which results in using default parallelism of the cluster.

Refer to a related PR or issue link (optional)

#7359

@codecov
Copy link

codecov bot commented Jan 13, 2023

Codecov Report

Merging #7370 (59554ad) into main (9fd7845) will increase coverage by 0.00%.
The diff coverage is 66.66%.

@@           Coverage Diff           @@
##             main    #7370   +/-   ##
=======================================
  Coverage   72.92%   72.92%           
=======================================
  Files        1071     1071           
  Lines      172382   172418   +36     
=======================================
+ Hits       125711   125739   +28     
- Misses      46671    46679    +8     
Flag Coverage Δ
rust 72.92% <66.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/config.rs 68.58% <ø> (ø)
src/frontend/src/handler/create_table_as.rs 0.00% <0.00%> (ø)
src/meta/src/lib.rs 2.40% <ø> (+0.02%) ⬆️
src/meta/src/manager/env.rs 95.72% <ø> (-0.04%) ⬇️
src/meta/src/rpc/service/ddl_service.rs 3.07% <0.00%> (+<0.01%) ⬆️
src/common/src/session_config/mod.rs 27.27% <33.33%> (+0.32%) ⬆️
src/frontend/src/handler/alter_table.rs 85.40% <100.00%> (+0.32%) ⬆️
src/frontend/src/handler/create_index.rs 88.01% <100.00%> (+0.11%) ⬆️
src/frontend/src/handler/create_mv.rs 94.95% <100.00%> (+0.07%) ⬆️
src/frontend/src/handler/create_sink.rs 97.20% <100.00%> (+0.10%) ⬆️
... and 13 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

let default_parallelism = if self.env.opts.minimal_scheduling {
let default_parallelism = if let Some(parallelism) = parallelism {
parallelism as usize
} else if self.env.opts.minimal_scheduling {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's time to remove this option, as it can be achieved by setting the variable, which sounds more flexible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -88,6 +88,7 @@ message DropSinkResponse {
message CreateMaterializedViewRequest {
catalog.Table materialized_view = 1;
stream_plan.StreamFragmentGraph fragment_graph = 2;
uint64 parallelism = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to put it into FragmentGraph? As it only makes sense for the streaming jobs, that is, with a graph in the DDL request. Seems this graph is not persisted and only used for the communication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Fixed.

# Conflicts:
#	dashboard/proto/gen/stream_plan.ts
#	proto/stream_plan.proto
#	src/frontend/src/handler/create_mv.rs
@@ -617,4 +622,6 @@ message StreamFragmentGraph {
repeated uint32 dependent_table_ids = 3;
uint32 table_ids_cnt = 4;
StreamEnvironment env = 5;
// 0 means use default value.
uint64 parallelism = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the Parallelism message here? 👀

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 Thanks for the work!

@mergify mergify bot merged commit 76d9dbc into main Jan 17, 2023
@mergify mergify bot deleted the wangzheng/mv_parallelism branch January 17, 2023 05:20
@lmatz lmatz added the user-facing-changes Contains changes that are visible to users label Jan 17, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Jan 27, 2023

Should the batch query jobs i.e. SELECT follow this parallelism?

@BugenZhao
Copy link
Member

Should the batch query jobs i.e. SELECT follow this parallelism?

Do you mean default parallelism of the non-scan stages? Good catch. 🤔

@fuyufjh
Copy link
Member

fuyufjh commented Jan 27, 2023

Should the batch query jobs i.e. SELECT follow this parallelism?

Do you mean default parallelism of the non-scan stages? Good catch. 🤔

Yes, exactly.

@fuyufjh
Copy link
Member

fuyufjh commented Jan 27, 2023

BTW, another example is CREATE TABLE. When creating a table, the underlying streaming job (for doing DMLs) will be created as well. Will streaming_parallelism be applied to it?

@zwang28
Copy link
Contributor Author

zwang28 commented Jan 28, 2023

BTW, another example is CREATE TABLE. When creating a table, the underlying streaming job (for doing DMLs) will be created as well. Will streaming_parallelism be applied to it?

Yes.

@BugenZhao
Copy link
Member

Should the batch query jobs i.e. SELECT follow this parallelism?

Do you mean default parallelism of the non-scan stages? Good catch. 🤔

Yes, exactly.

Maybe we need another BATCH_PARALLELISM for this. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants