Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce bpv24 vectorized decoding for DocIdsWriter #14176

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Jan 28, 2025

Background

Proposal
This PR tries to introduce the bpv24 vectorized decoding again and use the new bulk visit method to reduce virtual call, in favor of #13149 and #14138.

Luceneutil now can load 3 implementors of IntersectVisitor: RangeQuery Visitor, RangeQuery InverseVisitor and DynamicPruning Visitor. Here is the result on wikimediumall and taskCountPerCat=5

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
               TermDayOfYearSort      259.87      (3.9%)      269.26      (4.2%)    3.6% (  -4% -   12%) 0.005
             CountFilteredIntNRQ       61.70      (7.1%)       85.00      (2.0%)   37.8% (  26% -   50%) 0.000
                      TermDTSort      149.65      (6.2%)      232.85      (9.6%)   55.6% (  37% -   76%) 0.000
                  FilteredIntNRQ       82.76     (10.0%)      135.48      (3.7%)   63.7% (  45% -   85%) 0.000
                          IntNRQ       84.62     (10.5%)      139.05      (2.6%)   64.3% (  46% -   86%) 0.000

Tasks

TermDayOfYearSort: dayofyeardvsort//0 # freq=708472
TermDayOfYearSort: dayofyeardvsort//names # freq=402762
TermDayOfYearSort: dayofyeardvsort//nbsp # freq=492778
TermDayOfYearSort: dayofyeardvsort//part # freq=588644
TermDayOfYearSort: dayofyeardvsort//st # freq=306811

TermDateTimeSort: lastmodndvsort//0 # freq=708472
TermDateTimeSort: lastmodndvsort//names # freq=402762
TermDateTimeSort: lastmodndvsort//nbsp # freq=492778
TermDateTimeSort: lastmodndvsort//part # freq=588644
TermDateTimeSort: lastmodndvsort//st # freq=306811

IntNRQ: nrq//timesecnum 10044 66714
IntNRQ: nrq//timesecnum 1069 86092
IntNRQ: nrq//timesecnum 150 34646
IntNRQ: nrq//timesecnum 3110 51452
IntNRQ: nrq//timesecnum 3773 78558

FilteredIntNRQ: nrq//timesecnum 10044 66714 +filter=5%
FilteredIntNRQ: nrq//timesecnum 1069 86092 +filter=5%
FilteredIntNRQ: nrq//timesecnum 150 34646 +filter=5%
FilteredIntNRQ: nrq//timesecnum 3110 51452 +filter=5%
FilteredIntNRQ: nrq//timesecnum 3773 78558 +filter=5%

CountFilteredIntNRQ: count(nrq//timesecnum 10044 66714 +filter=5%)
CountFilteredIntNRQ: count(nrq//timesecnum 1069 86092 +filter=5%)
CountFilteredIntNRQ: count(nrq//timesecnum 150 34646 +filter=5%)
CountFilteredIntNRQ: count(nrq//timesecnum 3110 51452 +filter=5%)
CountFilteredIntNRQ: count(nrq//timesecnum 3773 78558 +filter=5%)

@gf2121 gf2121 changed the title bpv24 Introduce the bpv24 vectorized decoding for DocIdsWriter Jan 28, 2025
@gf2121 gf2121 changed the title Introduce the bpv24 vectorized decoding for DocIdsWriter Introduce bpv24 vectorized decoding for DocIdsWriter Jan 28, 2025
@gf2121 gf2121 requested review from iverase and jpountz January 28, 2025 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant