Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: persist migration plan to support one-by-one migration of streaming job #9641

Merged
merged 5 commits into from
May 22, 2023

Conversation

yezizp2012
Copy link
Member

@yezizp2012 yezizp2012 commented May 6, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

To fix #7728 , currently migration during recovery will result in a large body request to etcd if there are a large number of streaming jobs in the cluster. This PR will persist and validate migration plan in recovery processing, so that migration of streaming job one by one is feasible and does not affect correctness. We can deprecate it if the metadata of the streaming job will be split into dynamic and static part in the future.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@yezizp2012 yezizp2012 changed the title feat: store and update migration plan to support migrate streaming job one by one feat: persist migration plan to support one-by-one migration of streaming job May 6, 2023
@yezizp2012
Copy link
Member Author

I'll do some local testing before this pr ready to review. 🥵

@yezizp2012 yezizp2012 marked this pull request as ready for review May 17, 2023 06:50
@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Merging #9641 (0fb2d7f) into main (f6f3a5a) will decrease coverage by 0.03%.
The diff coverage is 6.35%.

@@            Coverage Diff             @@
##             main    #9641      +/-   ##
==========================================
- Coverage   71.03%   71.01%   -0.03%     
==========================================
  Files        1248     1249       +1     
  Lines      208732   208792      +60     
==========================================
- Hits       148279   148267      -12     
- Misses      60453    60525      +72     
Flag Coverage Δ
rust 71.01% <6.35%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/manager/catalog/fragment.rs 28.28% <0.00%> (+1.27%) ⬆️
src/meta/src/model/migration_plan.rs 0.00% <0.00%> (ø)
src/meta/src/model/mod.rs 97.26% <ø> (ø)
src/meta/src/model/stream.rs 70.64% <0.00%> (-2.12%) ⬇️
src/meta/src/barrier/recovery.rs 54.15% <12.34%> (-7.59%) ⬇️
src/meta/src/manager/cluster.rs 73.39% <100.00%> (-0.07%) ⬇️

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@zwang28 zwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/meta/src/barrier/recovery.rs Show resolved Hide resolved
src/meta/src/model/migration_plan.rs Show resolved Hide resolved
@yezizp2012
Copy link
Member Author

FYI it's convenient to write integration test with https://github.com/risingwavelabs/risingwave/tree/main/src/tests/simulation/tests/it

Yes, I tested it with madsim and it worked. The deterministic recovery test already includes some node expire logic, under that the migration will be triggered and tested. But for now it's only enabled for scaling recovery test. After this PR, I will enable it for all recovery test.

@zwang28 zwang28 self-requested a review May 22, 2023 09:24
@yezizp2012 yezizp2012 added this pull request to the merge queue May 22, 2023
Merged via the queue into main with commit 7167524 May 22, 2023
@yezizp2012 yezizp2012 deleted the zp/migration-one-by-one branch May 22, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid etcd request too large
2 participants