Skip to content

Commit

Permalink
archival: bump offset if seeking undershoots
Browse files Browse the repository at this point in the history
convert_begin_offset_to_file_pos can undershoot, in which case the
metadata returned by the archival policy won't perfectly align with the
caller's expectations. This could result in an offset_overlap when
applying the upload metadata.

It's unclear exactly why there are missing batches, given this code is
used for the non-compacted uploader. But the fix is simple enough, and
is one that we use in the compacted upload policy already[1].

[1] https://github.com/redpanda-data/redpanda/blob/4b8135ea5dbdca322ccf8efc85424413a20c5ade/src/v/archival/segment_reupload.cc#L516-L522
  • Loading branch information
andrwng authored and WillemKauf committed Oct 28, 2024
1 parent 40e0dca commit 8b616df
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions src/v/cluster/archival/archival_policy.cc
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,25 @@ static ss::future<std::optional<std::error_code>> get_file_range(
co_return seek_result.error();
}
auto seek = seek_result.value();
vlog(
archival_log.debug,
"Found offset {} when looking for target {}",
seek.offset,
begin_inclusive);
if (seek.offset < begin_inclusive) {
// `convert_begin_offset_to_file_pos` may return a lower value than
// the target, e.g. if the target was compacted away.
//
// [...][10, 20][40, 50][...]
// Target offset: 30
// Seek result offset: 21
//
// If so, the upload will still logically contain offset 30 if we
// return bytes starting at offset 21, but we need to lie about the
// offsets because the caller expects the returned metadata to
// align with the target.
seek.offset = begin_inclusive;
}
upl->starting_offset = seek.offset;
upl->file_offset = seek.bytes;
upl->base_timestamp = seek.ts;
Expand Down

0 comments on commit 8b616df

Please sign in to comment.