-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact: Update filtered blocks list before second downsample pass #7492
Conversation
If the second downsampling pass is given the same filteredMetas list as the first pass, it will create duplicates of blocks created in the first pass. It will also not be able to do further downsampling e.g 5m->1h using blocks created in the first pass, as it will not be aware of them. The metadata was already being synced before the second pass, but not updated into the filteredMetas list. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think the fix makes sense to me.
For this failure on the compactor tests: It was changed in the test from 18 to 19 here 2ed48f7 one day after the change that introduced the filter code which ends up duplicating a block 68bef3f. The reasoning at the time was that the hardcoded test value might have been taken from a run of the CI that didn't report the full number in the metric, but looking at it now this +1 has to be from a block being cleaned up that was a duplicate from the downsampling. |
The value was increased in 2ed48f7 to fix the test, with the reasoning that the hardcoded value must have been taken from a run of the CI that didn't reach the max value due to CI worker lag. More likely the real reason is that commit 68bef3f the day before had caused blocks to be duplicated during downsampling. The duplicate block is immediately marked for deletion, causing an extra +1 in the number of blocks cleaned. Subtracting one from the value again now that the block duplication issue is fixed. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
Combined with the previous commit this effectively reverts all of 2ed48f7, in which two assertions were changed to (unknowingly) account for a bug which had just been introduced in the downsampling code, causing duplicate blocks. This assertion change I am less sure on the reasoning for, but after running through the e2e tests several times locally, it is consistent that the only downsampling happens in the "compact-working" step, and so all other steps would report 0 for their total downsamples metric. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
d4e0cd5
to
bdcf63b
Compare
@yeya24 I made some further changes to fix the e2e compaction tests, please could you take another look when you have the time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry for the lag. Let's merge it
…hanos-io#7492) * compact: Update filtered blocks list before second downsample pass If the second downsampling pass is given the same filteredMetas list as the first pass, it will create duplicates of blocks created in the first pass. It will also not be able to do further downsampling e.g 5m->1h using blocks created in the first pass, as it will not be aware of them. The metadata was already being synced before the second pass, but not updated into the filteredMetas list. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * Update changelog Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * e2e/compact: Fix number of blocks cleaned assertion The value was increased in 2ed48f7 to fix the test, with the reasoning that the hardcoded value must have been taken from a run of the CI that didn't reach the max value due to CI worker lag. More likely the real reason is that commit 68bef3f the day before had caused blocks to be duplicated during downsampling. The duplicate block is immediately marked for deletion, causing an extra +1 in the number of blocks cleaned. Subtracting one from the value again now that the block duplication issue is fixed. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * e2e/compact: Revert change to downsample count assertion Combined with the previous commit this effectively reverts all of 2ed48f7, in which two assertions were changed to (unknowingly) account for a bug which had just been introduced in the downsampling code, causing duplicate blocks. This assertion change I am less sure on the reasoning for, but after running through the e2e tests several times locally, it is consistent that the only downsampling happens in the "compact-working" step, and so all other steps would report 0 for their total downsamples metric. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> --------- Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
Fixes #7488
If the second downsampling pass is given the same filteredMetas list as the first pass, it will create duplicates of blocks created in the first pass.
It will also not be able to do further downsampling e.g 5m->1h using blocks created in the first pass, as it will not be aware of them.
The metadata was already being synced before the second pass, but not updated into the filteredMetas list.
Changes
Re-uses the same block filtering logic from the first pass, to regenerate the list of filtered blocks from freshly synced metadata before the second pass.
I haven't made any changes regarding the metrics logic that takes place after the first filtering:
thanos/cmd/thanos/compact.go
Lines 465 to 469 in 57b42d1
as I'm not sure what exactly this is doing. So there may be some downsamples that happen in the second pass that aren't counted in the metrics, if that's what it's doing. Let's keep that separate to this fix.
Verification
Tested against a bucket that was creating duplicate downsampled blocks on v0.35.1.
Verified in compactor logs that no duplicate block is created on the second pass: #7488 (comment)