FTS seems to always consider the first row even if it should be masked by prefilter #2930

westonpace · 2024-09-25T13:30:53Z

Simple reproduction (courtesy of lancedb/lancedb#1656)

import lance
import pyarrow as pa

data = pa.table({
    "text": ["Frodo was a puppy", "There were several kittens playing", "Frodo was a happy puppy", "Frodo was a very happy puppy"],
    "sentiment": ["neutral", "neutral", "positive", "positive"]
})
ds = lance.write_dataset(data, "/tmp/test.lance", mode="overwrite")
ds.create_scalar_index("text", "INVERTED")
ds.create_scalar_index("sentiment", "BITMAP")

results = ds.to_table(full_text_query="puppy", filter="sentiment='positive'", prefilter=True, with_row_id=True)
print(results)
assert results.num_rows == 2

I suspect that the wand / posting iterator logic is doing something like (apologies in advance for my poor understanding of the wand search :) )...

candidate = iterator.current()
while not iterator.exhuasted():
  if candidate.matches_fts():
    iterator.advance_until_greater_than(candidate.score)

And the mask is only applied in iterator.next and so that first call to iterator.current() is always returning the first result, whether it matches the mask or not.

The text was updated successfully, but these errors were encountered:

Fixes #2930

westonpace mentioned this issue Sep 25, 2024

bug(python): where clause ignored in FTS when a scalar index exists lancedb/lancedb#1656

Closed

wjones127 self-assigned this Sep 27, 2024

wjones127 mentioned this issue Sep 30, 2024

fix: don't always include first doc #2957

Merged

wjones127 added a commit that referenced this issue Sep 30, 2024

fix: don't always include first doc (#2957)

4f8bd6d

Fixes #2930

wjones127 closed this as completed in #2957 Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FTS seems to always consider the first row even if it should be masked by prefilter #2930

FTS seems to always consider the first row even if it should be masked by prefilter #2930

westonpace commented Sep 25, 2024 •

edited by wjones127

Loading

FTS seems to always consider the first row even if it should be masked by prefilter #2930

FTS seems to always consider the first row even if it should be masked by prefilter #2930

Comments

westonpace commented Sep 25, 2024 • edited by wjones127 Loading

westonpace commented Sep 25, 2024 •

edited by wjones127

Loading