Table of Contents
pip install marqo-haystack
This is a document store integration for Marqo with haystack.
Marqo is an end-to-end vector search engine which includes preprocessing and inference to generate vectors from your data. You can use pre-trained models or bring finetuned ones.
Haystack is an end-to-end NLP framework that enables you to build applications powered by LLMs, with haystack you can build end-to-end NLP applications solving your use case using state-of-the-art models.
from marqo_haystack import MarqoDocumentStore
document_store = MarqoDocumentStore()
You can find a code example showing how to use the Document Store and the Retriever under the example/
folder of this repo.
For documentation on Marqo itself, please refer to the documentation.
You can use the MarqoDocumentStore
in your haystack pipelines for single queries like so:
from marqo_haystack import MarqoDocumentStore
from marqo_haystack.retriever import MarqoSingleRetriever
document_store = MarqoDocumentStore()
querying = Pipeline()
querying.add_component("retriever", MarqoSingleRetriever(document_store))
results = querying.run({"retriever": {"query": "Is black and white text boring?", "top_k": 3}})
Or for a list of queries:
from marqo_haystack import MarqoDocumentStore
from marqo_haystack.retriever import MarqoRetriever
document_store = MarqoDocumentStore()
querying = Pipeline()
querying.add_component("retriever", MarqoRetriever(document_store))
results = querying.run({"retriever": {"queries": ["Is black and white text boring?"], "top_k": 3}})
If you specify a collection_name
that doesn't exist as a Marqo index then one will be created for you.
from marqo_haystack import MarqoDocumentStore
# Use an existing index (if my-index does exist)
document_store = MarqoDocumentStore(collection_name="my-index")
# Create a new index (if my-new-index doesn't exist)
document_store = MarqoDocumentStore(collection_name="my-new-index")
# Use the default index name, 'documents'. One will be created if it doesn't exist.
document_store = MarqoDocumentStore()
You can also pass in settings for the index created by the API by passing a dictionary to the settings_dict
parameter. For details on the settings object please refer to the Marqo docs.
In this example we specify that the index should use the e5-large-v2
model and increase the ef_construction
parameter to 512 for the HNSW graph construction.
from marqo_haystack import MarqoDocumentStore
index_settings = {
"index_defaults": {
"model": "hf/e5-large-v2",
"ann_parameters" : {
"parameters": {
"ef_construction": 512
}
}
}
}
document_store = MarqoDocumentStore(settings_dict=index_settings)
This integration can also be used with Marqo Cloud. You can sign up or access you Marqo Cloud account here.
To use Marqo Cloud with this integration you will need to pass the collection_name
(index name), url
(https://api.marqo.ai
), and api_key
into the constructor.
Note that when using this integration with Marqo Cloud you will need to have already created an index in your Marqo Cloud account.
from marqo_haystack import MarqoDocumentStore
document_store = MarqoDocumentStore(
url="https://api.marqo.ai",
api_key="XXXXXXXXXXXXX",
collection_name="my-cloud-index"
)
marqo-haystack
is distributed under the terms of the Apache-2.0 license.