Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to boost word adjacency in search results #560

Open
felixvor opened this issue Dec 5, 2019 · 0 comments
Open

Not possible to boost word adjacency in search results #560

felixvor opened this issue Dec 5, 2019 · 0 comments

Comments

@felixvor
Copy link

felixvor commented Dec 5, 2019

For example, if I search for "Whoosh is great", results with that exact quote should have a higher priority over texts that just talk about whoosh in general and use the word "great" out of context. It would be great to have doc1 be the top search result in the following example:

from whoosh.index import create_in
from whoosh.qparser import OrGroup
from whoosh.fields import *

doc1 = "bla bla whoosh is great bla bla"
doc2 = "whoosh bla is bla great bla whoosh"
doc3 = "whoosh bla bla bla whoosh"
doc4 = "bla bla"

schema = Schema(name=TEXT, content=TEXT(stored=True))
ix = create_in("temp_index", schema)

writer = ix.writer()
writer.add_document(name="doc1", content=doc1)
writer.add_document(name="doc2", content=doc2)
writer.add_document(name="doc3", content=doc3)
writer.add_document(name="doc4", content=doc4)
writer.commit()


from whoosh.qparser import QueryParser
with ix.searcher() as searcher:
    query = QueryParser("content", schema=schema, group=OrGroup).parse("whoosh is great")
    print(query)
    results = searcher.search(query)
    for r in results:
        print(r)

>>>Output:
>>>(content:whoosh OR content:great)
>>><Hit {'content': 'whoosh bla is bla great bla whoosh'}>
>>><Hit {'content': 'bla bla whoosh is great bla bla'}>
>>><Hit {'content': 'whoosh bla bla bla whoosh'}>

The current order is doc2, doc1, doc3 but should be doc1, doc2, doc3 instead

It seems like this should be easy to do but the closest thing i could find in the docs was orGroup Factory which sadly didn't help with the issue.

Thanks in advance!

@felixvor felixvor changed the title Not possible to boost adjacency in search results Not possible to boost word adjacency in search results Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant