A minimal semantic search application:
- Query and document text is converted to embeddings by the application using Vespa's embedder functionality.
- Search by embedding or text match and use reciprocal rank fusion to fuse different rankings.
minimum-required-vespa-version="8.311.28"
Follow Vespa getting started
through the vespa deploy
step, cloning simple-semantic-search
instead of album-recommendation
.
Feed documents (this includes embed inference in Vespa):
vespa feed ext/*.json
Example queries using E5-Small-V2 embedding model that maps text to a 384-dimensional vector representation.
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=space contains many suns'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=shipping stuff over the sea'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=exchanging information by sound'
Remove the container after use:
$ docker rm -f vespa
The E5-small-v2 embedding model used in this sample application is suitable for production use and will produce good results in many domains without fine-tuning, especially when combined with text match features.
Transformer-based embedding models have named inputs and outputs that must
be compatible with the input and output names used by the Vespa Bert embedder or the Huggingface embedder.
See export_hf_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the Vespa huggingface-embedder.
The following exports intfloat/e5-small-v2:
./export_hf_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model
The following exports intfloat/multilingual-e5-small using quantization:
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize
The following exports intfloat/multilingual-e5-small using quantization and tokenizer patching to workaround this issue with compatiblity problems with loading saved tokenizers:
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize --patch_tokenizer
Prefer using the Vespa huggingface-embedder instead.
See export_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the bert-embedder.
The following exports intfloat/e5-small-v2 and saves the model parameters in an ONNX file and the vocab.txt
file
in the format expected by the Vespa bert-embedder.
./export_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model