-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] MarkDeletePosition causes the corresponding ledger to not be deleted in borderline cases #19077
Comments
It seems that it's due to the read position still belonging to the ledger |
The processing that should belong to the boundary case here is inappropriate, because the calculation of entries in stats-internal starts from 1. But the position of the first message messageID in a ledger is ledgerID:entryID = 1:0. So when MarkDeletePosition is marked to 1174539:226934, it means that 226935 messages in ledger 1174539 have been Acked, because we have set retention and TTL in the background, the maximum is no more than 10 days, and this Ledger is the data of 3 months ago, so These data must be expired by TTL |
Found a new thread, for this case, all ledger flags have the status: NoLedger This is a strange thing. From the perspective of Bookie, the current status of this Ledger is CLOSED, and we can determine this through In our scenario, the operation of reset cursor is not called, the message is expired through TTL, it seems that the same problem is encountered, not sure whether it is related to the following fix:
@lhotari @michaeljmarshall PTAL thanks! |
The issue had no activity for 30 days, mark with Stale label. |
Is this problem solved? Anyone else looking at this question? |
The issue had no activity for 30 days, mark with Stale label. |
Search before asking
Version
Bookie Version:4.14.4
Broker Version: 2.9.2
OS: Linux Centos7
Minimal reproduce step
Under the above version, we found that some EntryLogs in Bookie could not be deleted for a long time. At first, we thought that it was caused by not triggering minor GC or major GC. Then we adjusted the thresholds of minor GC and major GC, and found that the recovery effect was not very obvious. Then we scanned the problematic EntryLog file, and then obtained the list of all Ledgers in the current EntryLog, and obtained the ledger metadata corresponding to each Ledger, and then parsed out the Topic information corresponding to the Ledger, and found the following situation:
stats-internal.log
We can see that the
markDeletePosition
of all subscriptions under the current topic has been updated to the last message, indicating that all the messages in this topic have been correctly consumed and confirmed. But in fact, this ledger has not been deleted, which leads to the fact that the proportion of valid data in the EntryLog does not meet the triggering conditions of major GC, which further causes the EntryLog to exist for a long time and cannot be deleted.Observing the topic stats internal, we can see that it is currently at a boundary position, markDeletePosition is at the position of the last message of the previous Ledger, and there is no message in the next new Ledger, so whether there is a Ledger in the boundary case If it cannot be deleted, observe that the state of the ledger is already in the CLOSED state
The following is the data in the EntryLog file scanned by the scan script:
entry.log
What did you expect to see?
The CLOSED ledger will be deleted.
What did you see instead?
The CLOSED ledger not be deleted.
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: