Skip to content

Search Sentences from Index

Tianyu Gao edited this page May 19, 2021 · 1 revision

Our packages supports building index from a list of sentences or a txt file, and then search semantically-similar sentences from the index.

For building index:

model.build_index(sentences_or_file_path, use_faiss=None, faiss_fast=False, device=None, batch_size=64)

Inputs

  • sentences_or_file_path: a list of sentences or the path to a text file, which contains one sentence for each line.
  • use_faiss: whether to use faiss. If leaving it as None, the package will automatically detect faiss and use it if supported.
  • faiss_fast: whether to use the fast mode of faiss. Note that it might cause precision lost.
  • device: cuda or cpu.
  • batch_size: batch size for encoding sentences.

After building the index, you can use the search function by

model.search(queries, device=None, threshold=0.6, top_k=5)

Inputs

  • queries: a sentence or a list of sentences.
  • device: cuda or cpu.
  • threshold: only return results with cosine similarities higher than the threshold.
  • top_k: return top-k results.

Outputs

  • A list of results (or a single result if queries is a single sentence). Each result is a list of (sentence, score) tuple.