-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to parse raw text into docvectors on-the-fly for impact indexes #2122
Comments
Building three types of indexes, using uniCOIL as an example:
Saves a lot of space to store only raw text:
|
@AileenLin to help you out - this is currently what's not working on the Pyserini end:
Ultimately, I want to make this work. |
do you mean this error? I have tested anserini with the following and it matched the benchmark
|
Yup, we need to expose the feature in the Java class, and then wire the connections to Python. |
got it |
castorini/anserini#2122 Add ability to parse raw text into docvectors on-the-fly for impact indexes castorini/anserini#2165 Misalignment in SearchCollection and SimpleImpactSearcher implementation - so some changes in 2cr
This has been pushed out in v0.22.0 all done! |
As part of #1984 we added the ability to re-create docvectors on-the-fly so that we didn't need to store docvectors in the index (but we need to store the raw text, which is smaller).
This feature hasn't been exposed for impact indexes. We should do it so models like uniCOIL and SPLADE can benefit.
The text was updated successfully, but these errors were encountered: