-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.1.x] Fix timequery not returning results when racing with archival retention and gc #18597
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is needed for a ducktape test where we want to change the manifest upload timeout at runtime. E.g. set it to 0 to prevent manifest uploading from on point in time onward. It is declared in configuration.cc as not requiring restart but in fact it did require one prior to this commit. Other properties of the archiver configuration should be fixed too in a separate commit.
It is not reasonable to continue work after this point. The absence of a cursor cursor is interpreted as EOF in other parts of the system. Not throwing makes it impossible to differentiate between "no more data available" vs. "an error occurred". This is required for an upcoming commit that fixes a timequery bug where cloud storage returns "no offset found" instead of propagating an internal error.
This code tried to be clever and ignore exception in some of the cases where it was assumed it is safe to do so. I.e. if the start offset stored in the cloud moved forward. I don't believe covering these edge cases is necessary. - Reasoning about correctness becomes harder as it masks the case where by mistake read an out of range offset. - It hide from the client the fact that the offset they just tried to read does not exist anymore. As a user, if the log is prefix truncated, then I expect the reply to Fetch to be out of range and not an empty response.
Introduce a new test to show the existence of a bug. In particular, ```cpp // BUG: This assertion is disabled because it currently fails. // test_log.debug("Timestamp undershoots the partition"); // BOOST_TEST_REQUIRE(timequery(*this, model::timestamp(100), 3 * 6)); ``` The assertion is commented out because it is failing.
Starting the cursor from the clean offset is only required when computing retention because of an implementation detail which is documented in the added comment and referenced commits. In all other cases we must start searching from the archive start offset. This is particularly important for timequeries which must return the first visible batch above the timestamp.
When reading from tiered storage, we create a `async_manifest_view_cursor` using a query (offset or timestamp) and a begin offset which is set to the start of the stm region or start of archive (spillover). There is a bug inside `async_manifest_view_cursor` which causes it to throw out_of_range error when spillover contains data which is logically prefix truncated but matches the timequery. This is mainly because the begin offset is not properly propagated together with the query which makes it possible for the query to match a spillover manifest which is below the begin offset. In this commit, we remove the logic to ignore the out of range error and propagate it to the caller. In a later commit, we will revisit the code so that this edge cases is handled correctly inside the async_manifest_view and it does seek to the correct offset rather than throwing an out of range exception up to the client.
No functional changes.
Tiered Storage physically has a superset of the addressable data. This can be caused at least by the following: a) trim-prefix, b) retention being applied but garbage collection not finishing yet. For offset queries this isn't problematic because the bounds can be applied at higher level. In particular, partition object does validate that offset is in range before passing control to the remote partition. For timequeries prior to this commit such bounds were not enforced leading to a bug where cloud storage would return an offset -1 (no data found) in result when there actually was data or returning a wrong offset. Wrong offset: it would be returned because reads could have started prior to the partition visible/addressable offset. E.g. after retention was applied but before GC was run. Or, after a trim-prefix with an offset which falls in a middle of a batch. Missing offset: would be returned when the higher level reader was created with visible/addressable partition offset bounds, say \[1000, 1200\] but cloud storage would find the offset in a manifest with bounds \[300, 400\] leading to an out of range error which used to be ignored.
It is not required anymore because we carry the actual partition start and end to the lowest layer of the systems.
These cover more edge cases and highlight better an existing bug with the timequery in which start offset for timequery is ignored or handled inconsistently. See added code comments starting with "BUG:".
I believe this makes it clearer.
nvartolomei
changed the title
wip Nv/timequery
[v24.1.x] Fix timequery not returning results when racing with archival retention and gc
May 21, 2024
andrwng
approved these changes
May 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PR #18097
Closes #18565
Backports Required
Release Notes
Bug Fixes