Skip to content

Is it possible to retrieve only from a subset of the corpus? #82

Answered by xhluca
tsdev asked this question in Q&A
Discussion options

You must be logged in to vote

You can use the weight_mask parameter in retrieve and use an array of 1 for docs you want to retrieve, and 0 for others.

Here's an example:

def test_retrieve_with_weight_mask(self):
# first, try with default mode
query = "cat feline dog bird fish" # weights should be [2, 1, 1, 1], but after masking should be [2, 0, 0, 1]
for dt in [np.float32, np.int32, np.bool_]:
weight_mask = np.array([1, 0, 0, 1], dtype=dt)
ground_truth = np.array([[0, 3]])
query_tokens_obj = bm25s.tokenize([query], stopwords="en", stemmer=self.stemmer, return_ids=True)
# retrieve the top…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by xhluca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants