-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. #266
Conversation
…Factor on tiny segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jpountz -- tricky!
&& candidate.size() < mergeFactor | ||
&& bytesThisMerge < maxMergedSegmentBytes; | ||
&& candidate.size() < maxMergeAtOnce | ||
// We allow merging more that mergeFactor segments together if the merged segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/that
/than
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed.
} | ||
candidate.add(segSizeDocs.segInfo); | ||
bytesThisMerge += segBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (bytesThisMerge
) was just a dup of totAfterMergeBytes
?
Edit: oh, I see! Once you inverted the if (candidate.size() == 0) ...
then bytesThisMerge
became a dup of totAfterMergeBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, these were duplicate variables!
// value and segsPerTier to avoid suboptimal merging. | ||
private int maxMergeAtOnce = 10; | ||
// value and segsPerTier for segments above the floor size to avoid suboptimal merging. | ||
private int maxMergeAtOnce = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK even though we increased maxMergeAtOnce
to 30
, which means if the segments are so tiny that merging them is still under the floor we will allow merging up to 30
of such segments, the effective mergeFactor
for normal (big, above floor) merges is still 10
since we take min of maxMergeAtOnce
and segmentsPerTier
for mergeFactor
. Phew, complicated!
@jpountz it looks like this one is super-close, and a nice improvement to TMP's default behavior? |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
To avoid taking users by surprise, I plan on only changing the default value of maxMergeAtOnce from 10 to 30 in 11.0, meaning that TieredMergePolicy with default parameters will behave exactly the same in 10.x. Users will only experience a different behavior when they update |
…Factor on tiny segments. (apache#266) Closes apache#11111
I had missed it, but this change yielded a good speedup on the stored fields benchmark: https://benchmarks.mikemccandless.com/stored_fields_benchmarks.html |
…ther when the merge is below the min merge size. This is essentially porting apache#266 to `LogMergePolicy`. By allowing more than `mergeFactor` segments to be merged together for small merges, the merge policy gets a lower write amplification and indexes have fewer small segments.
#11111