Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle partially assigned batches in compaction dispatcher #3719

Open
patchwork01 opened this issue Nov 18, 2024 · 0 comments
Open

Handle partially assigned batches in compaction dispatcher #3719

patchwork01 opened this issue Nov 18, 2024 · 0 comments

Comments

@patchwork01
Copy link
Collaborator

patchwork01 commented Nov 18, 2024

Background

Split from:

When the dispatcher receives a batch of compaction jobs, it checks the state store to see if all the jobs have been assigned to their input files. At time of writing, it only sends the batch if all the jobs have been successfully assigned.

It may be possible to put the system into a state where some of the jobs have been assigned to their input files and some have not.

With the transaction log state store, file assignments are done for a whole batch in a single transaction, which will either succeed or fail. This means that we can't get to a state where a batch is partially assigned to its input files.

With the DynamoDB state store, or any state store that splits file assignment into multiple transactions, the file assignment for a single batch may be done gradually. The dispatcher may see a state where some of the jobs have had their files assigned successfully but others have not. The file assignments may still be in progress, or may have failed completely.

Description

We'd like to define how the system should behave when a pending batch of compaction jobs has been partially assigned to its input files.

Analysis

At time of writing, CompactionJobDispatcher calls StateStore.isAssigned, which throws an exception if any of the file references are assigned to a different job or have been removed. We probably need to do something to change this. At a minimum we can write a unit test for CompactionJobDispatcher to show its behaviour in this case.

We could consider some extra handling so that when we get a batch with some jobs assigned and some not, we can send just those jobs. We'd need to decide what to do with the rest of the batch, as we don't know whether the file assignments are still in progress or have failed completely.

We could wait until all the input files have definite assignments before we proceed with the batch. We could return it to the pending queue with a delay when we see a partially assigned state. If we see a state where all assignments are known but not all the jobs are assigned successfully, we could just send the successfully assigned jobs.

@patchwork01 patchwork01 added this to the 0.27.0 milestone Nov 18, 2024
@patchwork01 patchwork01 changed the title Clarify behaviour when dispatcher encounters a partially assigned batch Clarify behaviour when compaction dispatcher encounters a partially assigned batch Nov 18, 2024
@patchwork01 patchwork01 changed the title Clarify behaviour when compaction dispatcher encounters a partially assigned batch Handle partially assigned batches in compaction dispatcher Nov 19, 2024
@patchwork01 patchwork01 removed this from the 0.27.0 milestone Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants