-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrations infra, part 3: shard-local worker logic #20840
Conversation
d7879c2
to
5f4cccb
Compare
/dt |
With reconciliation loop, raft0 leadership updates and migration updates running concurrently and stuffed with yield points, we need to spread them apart to avoid race conditions.
In addition to NTP and sought migration state, worker may need some information from the migration definition. Adding data types for pieces of this information related to individual partitions.
… state To pull topic-specific details from migration definition later
… info Build info packs to be dispatched to workers on shards when scheduling partition work.
Spawn, retry and gather results from partition operations on shards.
Since we process RPC replies asynchronously, the reply map may be written into while we read from it. Move it to process.
5f4cccb
to
79ec19a
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51512#0190b631-6c64-440b-abcf-74c0835f90cd |
test failure: #21376 |
// this call must only tinker with `it` within the current seastar task, | ||
// it may be invalidated later! | ||
ssx::spawn_with_gate(_gate, [this, it]() { | ||
return do_work(it).then([ntp = it->first, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think here we should use then_wrapped
to handle exception thrown from do_work
, wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks, we need to make sure we don't lose it. I'd rather wrap everything in do_work
into a try-catch so it never throws, since it returns an error code. Do you think it'd be okay?
overal lgtm, one comment on exception handling |
Backend to gather from migration definition and to provide to worker information necessary to perform per-partition work.
Worker to spawn, retry and gather results from partition operations.
Backports Required
Release Notes