Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce specialization in ForUtil and ForDeltaUtil. #14048

Merged
merged 1 commit into from
Dec 7, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 6, 2024

These classes specialize all bits per value up to 24. But performance of high numbers of bits per value is not very important, because they are used by short postings lists, which are fast to iterate anyway. So this PR only specializes up to 16 bits per value.

For instance, if a postings list uses blocks of 17 bits per value, it means that one can find gaps of 65,536 consecutive doc IDs that do not contain the term. Such rare terms do not drive query performance.

These classes specialize all bits per value up to 24. But performance of high
numbers of bits per value is not very important, because they are used by short
postings lists, which are fast to iterate anyway. So this PR only specializes
up to 16 bits per value.

For instance, if a postings list uses blocks of 17 bits per value, it means
that one can find gaps of 65,536 consecutive doc IDs that do not contain the
term. Such rare terms do not drive query performance.
@jpountz jpountz added this to the 10.1.0 milestone Dec 6, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Dec 6, 2024

luceneutil doesn't notice any difference.

@jpountz jpountz merged commit e34e082 into apache:main Dec 7, 2024
3 checks passed
@jpountz jpountz deleted the reduce_for_util_specialization branch December 7, 2024 10:50
jpountz added a commit that referenced this pull request Dec 9, 2024
These classes specialize all bits per value up to 24. But performance of high
numbers of bits per value is not very important, because they are used by short
postings lists, which are fast to iterate anyway. So this PR only specializes
up to 16 bits per value.

For instance, if a postings list uses blocks of 17 bits per value, it means
that one can find gaps of 65,536 consecutive doc IDs that do not contain the
term. Such rare terms do not drive query performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants