Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTS seems to always consider the first row even if it should be masked by prefilter #2930

Closed
westonpace opened this issue Sep 25, 2024 · 0 comments · Fixed by #2957
Closed
Assignees

Comments

@westonpace
Copy link
Contributor

westonpace commented Sep 25, 2024

Simple reproduction (courtesy of lancedb/lancedb#1656)

import lance
import pyarrow as pa

data = pa.table({
    "text": ["Frodo was a puppy", "There were several kittens playing", "Frodo was a happy puppy", "Frodo was a very happy puppy"],
    "sentiment": ["neutral", "neutral", "positive", "positive"]
})
ds = lance.write_dataset(data, "/tmp/test.lance", mode="overwrite")
ds.create_scalar_index("text", "INVERTED")
ds.create_scalar_index("sentiment", "BITMAP")

results = ds.to_table(full_text_query="puppy", filter="sentiment='positive'", prefilter=True, with_row_id=True)
print(results)
assert results.num_rows == 2

I suspect that the wand / posting iterator logic is doing something like (apologies in advance for my poor understanding of the wand search :) )...

candidate = iterator.current()
while not iterator.exhuasted():
  if candidate.matches_fts():
    iterator.advance_until_greater_than(candidate.score)

And the mask is only applied in iterator.next and so that first call to iterator.current() is always returning the first result, whether it matches the mask or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants