Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LogMergePolicy to merge more than mergeFactor segments together when the merge is below the min merge size. #14166

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jan 23, 2025

This is essentially porting #266 to LogMergePolicy. By allowing more than mergeFactor segments to be merged together for small merges, the merge policy gets a lower write amplification and indexes have fewer small segments.

…ther when the merge is below the min merge size.

This is essentially porting apache#266 to `LogMergePolicy`. By allowing more than
`mergeFactor` segments to be merged together for small merges, the merge policy
gets a lower write amplification and indexes have fewer small segments.
@jpountz jpountz added this to the 10.2.0 milestone Jan 23, 2025
@jpountz
Copy link
Contributor Author

jpountz commented Jan 24, 2025

@original-brownbear I remember you looked at indexes that had more segments (and more small segments in particular) than you would have liked, so you may be interested in this.

@original-brownbear original-brownbear self-requested a review January 24, 2025 15:17
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have massive experience with the performance impact of merging more segments but seeing how all of this involves tiny segments only anyway + code looks good => LGTM, thanks! Looking at the problematic segments I've observed here and there, this seems like it should prevent a good chunk of those :)

@jpountz jpountz merged commit dd76dc4 into apache:main Feb 1, 2025
5 checks passed
@jpountz jpountz deleted the allow_more_than_merge_factor_segments_to_be_merged branch February 1, 2025 16:43
jpountz added a commit that referenced this pull request Feb 1, 2025
…ther when the merge is below the min merge size. (#14166)

This is essentially porting #266 to `LogMergePolicy`. By allowing more than
`mergeFactor` segments to be merged together for small merges, the merge policy
gets a lower write amplification and indexes have fewer small segments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants