refactor: actor wait barrier manager inject barrier #17613

wenym1 · 2024-07-08T10:36:24Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Currently, we erase the mutation of barrier in remote output, and in remote input, we restore the barrier mutation by getting it from the LocalBarrierWorker. This can ensure that all actors on a CN will not start processing the barrier before the LocalBarrierWorker on the CN receives the barrier injected from meta node.

In this PR, we extend this mechanism to local input/output, to ensure that, the mutation of each actor is always obtained from the LocalBarrierWorker, or further, from meta node. This feature is not leveraged yet, but provide us with better flexibility that different actors can receive different mutations for a same barrier.

Since now both remote and local exchange dispatcher will erase the mutation from barrier, we can make the mutation type of barrier to be generic. For barrier in the exchange dispatcher, the mutation is (), and has type alias type DispatcherBarrier = BarrierInner<()>, and for the barrier in actor, the mutation is the original one. In this way, we can statically ensure the mutation is erased in dispatcher, and that we always fetch the mutation from LocalBarrierWorker before emitting it to the real processing logic of actors. The Message enum is also introduced a generic mutation type, to indicate whether the message is flowing within actors, or in the dispatcher.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

hzxa21

LGTM. The idea is straight-forward.

BugenZhao · 2024-07-12T03:22:32Z

Can you please elaborate more in the PR body?

BugenZhao · 2024-07-12T06:34:21Z

src/stream/src/executor/exchange/input.rs

+            yield process_msg(msg, |barrier| {
+                mutation_subscriber.get_or_insert_with(|| {
+                    local_barrier_manager.subscribe_barrier_mutation(self_actor_id, barrier)
+                })
+            })
+            .await?;


There used to be no need for local exchanges to acquire mutation from the channel because we don't prune it on the sender side. So this seems to be a regression. 🤔

There shouldn't be too much regression when we have #17613. When local input receives a barrier, in most cases the mutation already be sent by the LocalBarrierWorker to the mutation subscribe channel.

wenym1 · 2024-07-16T10:06:45Z

Can you please elaborate more in the PR body?

Updated.

In brief, the motivation of this PR is to provide the flexibility that different actors can have different mutations. This will be helpful in partial checkpoint implementation, or for #15490.

For #15490, a rough idea in my mind is, when we do scale, we don't need to inject the mutation from source via the newly injected barrier. We can first locate the most upstream fragment that needs to receive the scale mutation, and then figure out the smallest epoch that has not started being processed by any actor in this fragment, and then we inject the mutation to the actors of this fragment by sending a different mutation from LocalBarrierWorker

BugenZhao · 2024-07-17T04:39:25Z

and then figure out the smallest epoch that has not started being processed by any actor in this fragment,

Sounds interesting but delicate under asynchronous checkpointing. 🤣

wenym1 · 2024-07-17T05:13:42Z

Sounds interesting but delicate under asynchronous checkpointing. 🤣

Do you mean our current global explicit try_wait_epoch during scale? This can be resolved by explicitly try_wait_epoch after the vnode bitmap is changed. On update vnode bitmap, executors can first yield the barrier, and then update the vnode bitmap, and then locally try_wait_epoch. Since the barrier has been yielded, executors should be able to assume that this epoch can eventually finish and not block by the in place try_wait_epoch.

yezizp2012 · 2024-07-17T07:48:15Z

Sounds interesting but delicate under asynchronous checkpointing. 🤣

Do you mean our current global explicit try_wait_epoch during scale? This can be resolved by explicitly try_wait_epoch after the vnode bitmap is changed. On update vnode bitmap, executors can first yield the barrier, and then update the vnode bitmap, and then locally try_wait_epoch. Since the barrier has been yielded, executors should be able to assume that this epoch can eventually finish and not block by the in place try_wait_epoch.

It sounds like the correctness can be guaranteed.

and then figure out the smallest epoch that has not started being processed by any actor in this fragment,

I agree that extending this mechanism to local input/output can benefit a lot for partial checkpoint, drop and cancel commands. Figuring out which epoch or barrier the mutation should be attached requires more works in global and local barrier managers. Because the actors of the target fragment may be distributed across different compute nodes. Haven't thought of a good solution to ensure certainty yet.

Anyway, the impl LGTM.

src/batch/src/task/env.rs

wenym1 · 2024-07-18T07:40:09Z

@yezizp2012 @BugenZhao Is it ok to merge this PR?

yezizp2012

LGTM

BugenZhao

Rest LGTM

src/stream/src/executor/mod.rs

yezizp2012 · 2024-07-19T05:10:36Z

From the recovery test failure, it is highly likely that there was a problem with the recovery process. Retrying running slt for 5 or 10 more times should be enough already. 🤔

wenym1 · 2024-07-19T05:29:27Z

From the recovery test failure, it is highly likely that there was a problem with the recovery process. Retrying running slt for 5 or 10 more times should be enough already. 🤔

Indeed. I saw repeatedly scale actors failed error=scale_actors failed to acquire reschedule_lock. Looks like we are entering some kind of deadlock.

wenym1 · 2024-07-19T06:25:50Z

From the recovery test failure, it is highly likely that there was a problem with the recovery process. Retrying running slt for 5 or 10 more times should be enough already. 🤔

Indeed. I saw repeatedly scale actors failed error=scale_actors failed to acquire reschedule_lock. Looks like we are entering some kind of deadlock.

The recovery test has passed. The previous cause is that, on recovery, the catalog manager only notify failure at the beginning of recovery. If a create mv ddl attaches a finish notifier to the catalog manager after having notified failure, the notifier will never be notified. The create mv ddl holds the reschedule read lock, and then in recovery it will never be able to acquire the reschedule write lock, and keep recovery without notifying the failure, and cause deadlock.

wenym1 added 2 commits July 7, 2024 17:49

refactor: actor wait barrier manager inject barrier

70a4a01

fix slow test

23b6db8

github-actions bot added the type/refactor label Jul 8, 2024

wenym1 added 3 commits July 9, 2024 14:40

fix

e9c0ede

Merge branch 'main' into yiming/actor-wait-barrier-inject

53dc8f7

fix test

894b3e1

hzxa21 approved these changes Jul 11, 2024

View reviewed changes

hzxa21 requested a review from BugenZhao July 11, 2024 10:37

BugenZhao reviewed Jul 12, 2024

View reviewed changes

BugenZhao requested a review from yezizp2012 July 12, 2024 06:34

wenym1 requested a review from BugenZhao July 16, 2024 10:16

yezizp2012 reviewed Jul 17, 2024

View reviewed changes

src/batch/src/task/env.rs Show resolved Hide resolved

yezizp2012 approved these changes Jul 18, 2024

View reviewed changes

BugenZhao approved these changes Jul 18, 2024

View reviewed changes

src/stream/src/executor/mod.rs Show resolved Hide resolved

src/stream/src/executor/mod.rs Outdated Show resolved Hide resolved

wenym1 added 5 commits July 18, 2024 18:19

Merge branch 'main' into yiming/actor-wait-barrier-inject

9007d82

address comment

4f4c5eb

fix test

7afcfab

increase retry

7a4fd76

increase timeout

339c02a

wenym1 added 2 commits July 19, 2024 13:53

call notify finish fail in every recovery

960ad16

Merge branch 'main' into yiming/actor-wait-barrier-inject

734721a

github-actions bot added the ci/run-e2e-single-node-tests label Jul 19, 2024

yezizp2012 approved these changes Jul 19, 2024

View reviewed changes

wenym1 enabled auto-merge July 19, 2024 06:29

wenym1 added this pull request to the merge queue Jul 19, 2024

Merged via the queue into main with commit bd3b9a1 Jul 19, 2024
30 of 31 checks passed

wenym1 deleted the yiming/actor-wait-barrier-inject branch July 19, 2024 06:53

wenym1 mentioned this pull request Jul 19, 2024

refactor(meta): only store CreateStreamingJob command in tracker #17742

Merged

9 tasks

wenym1 mentioned this pull request Sep 3, 2024

feat(snapshot-backfill): only receive mutation from barrier worker for snapshot backfill #18210

Merged

9 tasks

wenym1 mentioned this pull request Sep 12, 2024

Tracking: support partial checkpoint #14041

Open

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: actor wait barrier manager inject barrier #17613

refactor: actor wait barrier manager inject barrier #17613

wenym1 commented Jul 8, 2024 •

edited

Loading

hzxa21 left a comment

BugenZhao commented Jul 12, 2024

BugenZhao Jul 12, 2024

wenym1 Jul 16, 2024

wenym1 commented Jul 16, 2024 •

edited

Loading

BugenZhao commented Jul 17, 2024

wenym1 commented Jul 17, 2024

yezizp2012 commented Jul 17, 2024

wenym1 commented Jul 18, 2024

yezizp2012 left a comment

BugenZhao left a comment

yezizp2012 commented Jul 19, 2024

wenym1 commented Jul 19, 2024

wenym1 commented Jul 19, 2024

refactor: actor wait barrier manager inject barrier #17613

refactor: actor wait barrier manager inject barrier #17613

Conversation

wenym1 commented Jul 8, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

hzxa21 left a comment

Choose a reason for hiding this comment

BugenZhao commented Jul 12, 2024

BugenZhao Jul 12, 2024

Choose a reason for hiding this comment

wenym1 Jul 16, 2024

Choose a reason for hiding this comment

wenym1 commented Jul 16, 2024 • edited Loading

BugenZhao commented Jul 17, 2024

wenym1 commented Jul 17, 2024

yezizp2012 commented Jul 17, 2024

wenym1 commented Jul 18, 2024

yezizp2012 left a comment

Choose a reason for hiding this comment

BugenZhao left a comment

Choose a reason for hiding this comment

yezizp2012 commented Jul 19, 2024

wenym1 commented Jul 19, 2024

wenym1 commented Jul 19, 2024

wenym1 commented Jul 8, 2024 •

edited

Loading

wenym1 commented Jul 16, 2024 •

edited

Loading