-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10599: Improve LogMergePolicy's handling of maxMergeSize. #935
Conversation
With this change, segments are more likely to be considered for merging until they reach the max merge size. Before this change, LogMergePolicy would exclude an entire window of `mergeFactor` segments from merging if this window had a too large segment and other segments were on the same tier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This PR looks reasonable to me, but I'm a bit confused by existing logic
if (anyMerging || end - start <= 1) { | ||
// skip: there is an ongoing merge at the current level or the computed merge has a single | ||
// segment and this merge policy doesn't do singleton merges | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically for a level which meets the merge factor we're merging every segments slice that is not exceed the max size specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
@@ -568,23 +568,41 @@ public MergeSpecification findMerges( | |||
// Finally, record all merges that are viable at this level: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the level quantization process above actually:
- From L527 it determines the max level, and then a level bottom based on that
- Then it find the right boundary by search backwards for the first qualified segment in L556-
This seems assuming the levels
are sorted, but I can't find the sorting anywhere. Or otherwise how could it guarantee that the level decided in range of [start,end)
won't contain segments that have lower level than levelBottom
?
Sorry the question is not quite related to the change itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how could it guarantee that the level decided in range of [start,end) won't contain segments that have lower level than levelBottom?
LogMergePolicy doesn't try to provide this guarantee. It's actually important it doesn't try to provide this guarantee, otherwise it could end up with lots of unmerged segments. For instance imagine that you have 9 segments (S1..S9) on level 10 then one segment (S10) on level 9, one more segment on level 10 (S11) and then potentially other segments.
If LogMergePolicy refused to merge segments that are on a lower level then it could never merge together segments S1..S10. This is because segment S10 can only be merged with segments that are on a higher level because both the previous and the next segment are on a higher level, and LogMergePolicy only merges adjacent segments.
This is a downside of LogMergePolicy compared to TieredMergePolicy: because it only performs merges of adjacent segments, it sometimes has to return merges where not all segments are on the same level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
Thanks @zhaih ! |
With this change, segments are more likely to be considered for merging until they reach the max merge size. Before this change, LogMergePolicy would exclude an entire window of `mergeFactor` segments from merging if this window had a too large segment and other segments were on the same tier.
…che#935) With this change, segments are more likely to be considered for merging until they reach the max merge size. Before this change, LogMergePolicy would exclude an entire window of `mergeFactor` segments from merging if this window had a too large segment and other segments were on the same tier.
With this change, segments are more likely to be considered for merging until
they reach the max merge size. Before this change, LogMergePolicy would exclude
an entire window of
mergeFactor
segments from merging if this window had atoo large segment and other segments were on the same tier.