-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_pageserver_chaos: duplicate L1 in test #4088
Comments
Questions for gauging the severity with regards to #4086:
|
The duplicate check was introduced in #3869. |
This reverts commit 732acc5. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088
This reverts commit 732acc5. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088
I created a duplicate #4690 so it seems it has been re-introduced again, so I guess we have to revert it again? |
Closing the #4690, reproducing the desc here:
So this was an L1. |
## Problem Compactions might generate files of exactly the same name as before compaction due to our naming of layer files. This could have already caused some mess in the system, and is known to cause some issues like #4088. Therefore, we now consider duplicated layers in the post-compaction process to avoid violating the layer map duplicate checks. related previous works: close #4094 error reported in: #4690, #4088 ## Summary of changes If a file already exists in the layer map before the compaction, do not modify the layer map and do not delete the file. The file on disk at that time should be the new one overwritten by the compaction process. This PR also adds a test case with a fail point that produces exactly the same set of files. This bypassing behavior is safe because the produced layer files have the same content / are the same representation of the original file. An alternative might be directly removing the duplicate check in the layer map, but I feel it would be good if we can prevent that in the first place. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>
This is now handled by #4696. Last recorded flakyness was |
https://neon-github-public-dev.s3.amazonaws.com/reports/main/debug/4809176454/index.html#suites/094ca0c3798926b0b3676f7b1a8a5bdb/28eced2609521275/
Tripped on a ERROR message, reproduced here in full (except stacktrace):
While inspecting the logs we noted:
Which we deduced to be a different L1 than the duplicate, because it has a different key_start. The name is produced by
neon/pageserver/src/tenant/storage_layer/delta_layer.rs
Lines 500 to 507 in dd22c87
Slack thread: https://neondb.slack.com/archives/C033QLM5P7D/p1682521379341769?thread_ts=1682519017.949719&cid=C033QLM5P7D
The text was updated successfully, but these errors were encountered: