-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: persist migration plan to support one-by-one migration of streaming job #9641
Conversation
I'll do some local testing before this pr ready to review. 🥵 |
Codecov Report
@@ Coverage Diff @@
## main #9641 +/- ##
==========================================
- Coverage 71.03% 71.01% -0.03%
==========================================
Files 1248 1249 +1
Lines 208732 208792 +60
==========================================
- Hits 148279 148267 -12
- Misses 60453 60525 +72
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI it's convenient to write integration test with https://github.com/risingwavelabs/risingwave/tree/main/src/tests/simulation/tests/it
Yes, I tested it with madsim and it worked. The deterministic recovery test already includes some node expire logic, under that the migration will be triggered and tested. But for now it's only enabled for scaling recovery test. After this PR, I will enable it for all recovery test. |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
To fix #7728 , currently migration during recovery will result in a large body request to etcd if there are a large number of streaming jobs in the cluster. This PR will persist and validate migration plan in recovery processing, so that migration of streaming job one by one is feasible and does not affect correctness. We can deprecate it if the metadata of the streaming job will be split into dynamic and static part in the future.
Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note