Skip to content

Can we retrieve transformed vectors from tokens (or words) as sparse embeddings? #69

Answered by xhluca
lspataroG asked this question in Q&A
Discussion options

You must be logged in to vote

Thank you for the kind words and glad to hear you enjoy the library!

Right now, it is possible to get a match score (float) between a query and a document; for D documents, that becomes a D-dimension numpy vector. To do this, simply using the retriever.get_scores function!

Here's the link:

bm25s/bm25s/__init__.py

Lines 502 to 514 in e1b39e5

def get_scores(self, query_tokens_single: List[str], weight_mask=None) -> np.ndarray:
if not isinstance(query_tokens_single, list):
raise ValueError("The query_tokens must be a list of tokens.")
if isinstance(query_tokens_single[0], str):
query_tokens_ids = self.get_tokens_ids(query_tokens_single)
elif isinstance(

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by xhluca
Comment options

You must be logged in to vote
3 replies
@xhluca
Comment options

@lspataroG
Comment options

@xhluca
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants