-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10599: Improve LogMergePolicy's handling of maxMergeSize. #935
Merged
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -568,23 +568,41 @@ public MergeSpecification findMerges( | |
// Finally, record all merges that are viable at this level: | ||
int end = start + mergeFactor; | ||
while (end <= 1 + upto) { | ||
boolean anyTooLarge = false; | ||
boolean anyMerging = false; | ||
long mergeSize = 0; | ||
long mergeDocs = 0; | ||
for (int i = start; i < end; i++) { | ||
final SegmentInfoAndLevel segLevel = levels.get(i); | ||
final SegmentCommitInfo info = segLevel.info; | ||
anyTooLarge |= | ||
(size(info, mergeContext) >= maxMergeSize | ||
|| sizeDocs(info, mergeContext) >= maxMergeDocs); | ||
if (mergingSegments.contains(info)) { | ||
anyMerging = true; | ||
break; | ||
} | ||
long segmentSize = size(info, mergeContext); | ||
long segmentDocs = sizeDocs(info, mergeContext); | ||
if (mergeSize + segmentSize > maxMergeSize || mergeDocs + segmentDocs > maxMergeDocs) { | ||
// This merge is full, stop adding more segments to it | ||
if (i == start) { | ||
// This segment alone is too large, return a singleton merge | ||
if (verbose(mergeContext)) { | ||
message( | ||
" " + i + " is larger than the max merge size/docs; ignoring", mergeContext); | ||
} | ||
end = i + 1; | ||
} else { | ||
// Previous segments are under the max merge size, return them | ||
end = i; | ||
} | ||
break; | ||
} | ||
mergeSize += segmentSize; | ||
mergeDocs += segmentDocs; | ||
} | ||
|
||
if (anyMerging) { | ||
// skip | ||
} else if (!anyTooLarge) { | ||
if (anyMerging || end - start <= 1) { | ||
// skip: there is an ongoing merge at the current level or the computed merge has a single | ||
// segment and this merge policy doesn't do singleton merges | ||
} else { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So basically for a level which meets the merge factor we're merging every segments slice that is not exceed the max size specified? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. |
||
if (spec == null) { | ||
spec = new MergeSpecification(); | ||
} | ||
|
@@ -604,14 +622,6 @@ public MergeSpecification findMerges( | |
mergeContext); | ||
} | ||
spec.add(new OneMerge(mergeInfos)); | ||
} else if (verbose(mergeContext)) { | ||
message( | ||
" " | ||
+ start | ||
+ " to " | ||
+ end | ||
+ ": contains segment over maxMergeSize or maxMergeDocs; skipping", | ||
mergeContext); | ||
} | ||
|
||
start = end; | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the level quantization process above actually:
This seems assuming the
levels
are sorted, but I can't find the sorting anywhere. Or otherwise how could it guarantee that the level decided in range of[start,end)
won't contain segments that have lower level thanlevelBottom
?Sorry the question is not quite related to the change itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogMergePolicy doesn't try to provide this guarantee. It's actually important it doesn't try to provide this guarantee, otherwise it could end up with lots of unmerged segments. For instance imagine that you have 9 segments (S1..S9) on level 10 then one segment (S10) on level 9, one more segment on level 10 (S11) and then potentially other segments.
If LogMergePolicy refused to merge segments that are on a lower level then it could never merge together segments S1..S10. This is because segment S10 can only be merged with segments that are on a higher level because both the previous and the next segment are on a higher level, and LogMergePolicy only merges adjacent segments.
This is a downside of LogMergePolicy compared to TieredMergePolicy: because it only performs merges of adjacent segments, it sometimes has to return merges where not all segments are on the same level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!