-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix closed writer panic by resetting buffers/writer correctly on error path #1379
Conversation
Signed-off-by: Annanay <annanayagarwal@gmail.com>
Is there an actual panic here? Are we just looking to guarantee we do all cleanup work even if |
Nope theres no panic. We just want to make sure ingesters function normally after a transient failure like disk full. |
Frankly if the disk is full it might make sense to actually panic. This is a pretty dire failure mode for Tempo as ingested data will be lost. With this change how will the ingester behave if the disk is persistently full? |
Yes I agree. If the disk is persistently full the ingester will continue to try and write but fail at every attempt till the disk is replaced. Not sure if its worth bubbling that error up and fail on the write path in which case it will trigger some ingester write alert. But as of now the failure to write to disk will go undetected unless someone checks the logs when no more blocks are being created. Does it then make sense to panic if there's a error in these lines: I believe that should do the trick, WDYT? |
Signed-off-by: Annanay <annanayagarwal@gmail.com>
Signed-off-by: Annanay <annanayagarwal@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Looks good.
Signed-off-by: Annanay annanayagarwal@gmail.com
What this PR does:
Refer to a longer explanation in this comment: #1374 (comment)
There is a small section of code in our data writer where the compression writer is closed but not reset. If there are any errors like writing to disk / transient errors with kubernetes pvcs, the function could return without resetting the compression writer. This PR changes that to put the reset logic in a defer func so it triggers whenever the return is triggered.
I understand that working with defer funcs in this manner makes it tricky to follow the logic, so I'm open to alternate implementation methods.
Edit: Offline discussions confirmed my fears as mentioned above :) and so went ahead with a different strategy.
cc @jvilhuber
Which issue(s) this PR fixes:
Fixes the red herring error message in #1374. But we still need to fix the fact that we drop traces without warning when there are issues writing to disk.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]