Colbert on Astra

POC of ColBERT search, compared with vanilla DPR.

Requirements

Assumes you have Cassandra (vsearch branch) running locally. Should "just work" with Astra given minor changes to db.py.
Dataset with DPR ada002 embeddings already computed, this code does not do that (but adding it would just be a few lines)
Download the ColBERT model from https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz and extract it to the checkpoints/ subdirectory

Usage

cqlsh < create.cqlsh

2a. hack up compute-and-load.py to load your chunks. currently it expects json files that look like this: { 'title': $title, '0': { 'content': $raw_text, 'embedding': $ada002_embedding }, '1': { 'content': $raw_text, 'embedding': $ada002_embedding }, ... }

If you don't have pre-chunked documents, or you don't have or don't want to save a single dense embedding for comparison, then adjust it accordingly.

2b. alternatively, hack up compute.py and load.py instead. compute computes the colbert embeddings and augments the json file with them, and load sends those to Cassandra. I did this because I wanted to compute the embeddings on a fast gpu machine.

python serve_httpy.py and navigate to http://localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
README.md		README.md
beir_bm25.py		beir_bm25.py
beir_colbert.py		beir_colbert.py
beir_dp.py		beir_dp.py
compute-and-load.py		compute-and-load.py
compute.py		compute.py
create.cqlsh		create.cqlsh
db.py		db.py
load.py		load.py
plot.py		plot.py
serve.py		serve.py
serve_http.py		serve_http.py
substring.py		substring.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Colbert on Astra

Requirements

Usage

About

Releases

Packages

Languages

jbellis/colbert-astra

Folders and files

Latest commit

History

Repository files navigation

Colbert on Astra

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages