Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix the panic caused by scaling with a single fragment downstream of no shuffle #18581

Merged
merged 2 commits into from
Sep 18, 2024

Conversation

shanicky
Copy link
Contributor

@shanicky shanicky commented Sep 18, 2024

Signed-off-by: Shanicky Chen peng@risingwave-labs.comI hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR fixes an issue that could cause panic during the Scaling process.

As a single fragment downstream of no shuffle, which can also be considered a single actor, it should not need bitmap updates like its upstream. Previously, directly unwrapping would lead to panic.

Theoretically, this should have been discovered by tests (we provided integration tests for single fragments early on), but our integration tests were all changed to arrangement backfill mode, so it was hidden. 🫠

Changes to ScaleController Logic

  • Clarified the construction of actor group map within ScaleController with a new comment.
  • Introduced a validation to ensure that fragments with Single distribution type have an empty bitmap.
  • Implemented a safer handling of bitmap unwrapping by adding a match statement which checks for the presence of a bitmap and asserts appropriate conditions based on the fragment's distribution type.

Updates to Singleton Migration Tests

  • Refactored test_singleton_migration to utilize a new helper function test_singleton_migration_helper.
  • The new helper function takes a Configuration parameter to facilitate testing with various setups.
  • Created two specific tests, test_singleton_migration and test_singleton_migration_for_no_shuffle, each utilizing different scale configurations to ensure proper behavior during singleton migrations.

These improvements aim to enhance the robustness of the scale controller's logic in a stream-processing framework, and to increase the flexibility of testing by accommodating different singleton migration scenarios. Code maintainability and error handling have also been addressed, streamlining the future development and debugging processes.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

…igurations using a new helper function

Signed-off-by: Shanicky Chen <peng@risingwave-labs.com>
… modify `upstream_fragment_bitmap`.

Signed-off-by: Shanicky Chen <peng@risingwave-labs.com>
Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please ensure that this fix can be merged into v2.0.0, otherwise user will encounter this issue after upgrading to v2.0.0. Cc @lmatz

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@shanicky shanicky enabled auto-merge September 18, 2024 08:14
@shanicky shanicky added this pull request to the merge queue Sep 18, 2024
Merged via the queue into main with commit 745fb16 Sep 18, 2024
45 checks passed
@shanicky shanicky deleted the peng/fix-single-failure branch September 18, 2024 08:53
github-actions bot pushed a commit that referenced this pull request Sep 18, 2024
…m of no shuffle (#18581)

Signed-off-by: Shanicky Chen <peng@risingwave-labs.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 18, 2024
…m of no shuffle (#18581) (#18587)

Signed-off-by: Shanicky Chen <peng@risingwave-labs.com>
Co-authored-by: Shanicky Chen <peng@risingwave-labs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants