Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. #266

Merged
merged 8 commits into from
Dec 3, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 26, 2021

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz -- tricky!

&& candidate.size() < mergeFactor
&& bytesThisMerge < maxMergedSegmentBytes;
&& candidate.size() < maxMergeAtOnce
// We allow merging more that mergeFactor segments together if the merged segment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that/than

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed.

}
candidate.add(segSizeDocs.segInfo);
bytesThisMerge += segBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (bytesThisMerge) was just a dup of totAfterMergeBytes?

Edit: oh, I see! Once you inverted the if (candidate.size() == 0) ... then bytesThisMerge became a dup of totAfterMergeBytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, these were duplicate variables!

// value and segsPerTier to avoid suboptimal merging.
private int maxMergeAtOnce = 10;
// value and segsPerTier for segments above the floor size to avoid suboptimal merging.
private int maxMergeAtOnce = 30;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK even though we increased maxMergeAtOnce to 30, which means if the segments are so tiny that merging them is still under the floor we will allow merging up to 30 of such segments, the effective mergeFactor for normal (big, above floor) merges is still 10 since we take min of maxMergeAtOnce and segmentsPerTier for mergeFactor. Phew, complicated!

@mikemccand
Copy link
Member

@jpountz it looks like this one is super-close, and a nice improvement to TMP's default behavior?

Copy link

github-actions bot commented Jan 9, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot removed the Stale label Nov 21, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Nov 21, 2024

To avoid taking users by surprise, I plan on only changing the default value of maxMergeAtOnce from 10 to 30 in 11.0, meaning that TieredMergePolicy with default parameters will behave exactly the same in 10.x. Users will only experience a different behavior when they update segsPerTier or maxMergeAtOnce in a way that maxMergeAtOnce is greater than segsPerTier.

@jpountz jpountz added this to the 10.1.0 milestone Nov 21, 2024
@jpountz jpountz merged commit df7b1f6 into apache:main Dec 3, 2024
3 checks passed
@jpountz jpountz deleted the maxmergefactor branch December 3, 2024 16:47
jpountz added a commit that referenced this pull request Dec 3, 2024
benchaplin pushed a commit to benchaplin/lucene that referenced this pull request Dec 31, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Jan 7, 2025

I had missed it, but this change yielded a good speedup on the stored fields benchmark: https://benchmarks.mikemccandless.com/stored_fields_benchmarks.html

jpountz added a commit to jpountz/lucene that referenced this pull request Jan 23, 2025
…ther when the merge is below the min merge size.

This is essentially porting apache#266 to `LogMergePolicy`. By allowing more than
`mergeFactor` segments to be merged together for small merges, the merge policy
gets a lower write amplification and indexes have fewer small segments.
jpountz added a commit that referenced this pull request Feb 1, 2025
…ther when the merge is below the min merge size. (#14166)

This is essentially porting #266 to `LogMergePolicy`. By allowing more than
`mergeFactor` segments to be merged together for small merges, the merge policy
gets a lower write amplification and indexes have fewer small segments.
jpountz added a commit that referenced this pull request Feb 1, 2025
…ther when the merge is below the min merge size. (#14166)

This is essentially porting #266 to `LogMergePolicy`. By allowing more than
`mergeFactor` segments to be merged together for small merges, the merge policy
gets a lower write amplification and indexes have fewer small segments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants