Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions docs-website/docs/concepts/device-management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ Haystack’s device management is built on the following abstractions:

With the above abstractions, Haystack can fully address any supported device that’s part of your local machine and can support the usage of multiple devices at the same time. Every component that supports local inference will internally handle the conversion of these generic representations to their backend-specific representations.

:::info
Source Code
:::info Source Code

Find the full code for the abstractions above in the Haystack GitHub [repo](https://github.com/deepset-ai/haystack/blob/6a776e672fb69cc4ee42df9039066200f1baf24e/haystack/utils/device.py).
:::
Expand Down Expand Up @@ -82,14 +81,14 @@ class MyComponent(Component):
self.model = AutoModel.from_pretrained("deepset/bert-base-cased-squad2", device=self.device.to_hf())

def to_dict(self):
# Serialize the policy like any other (custom) data.
# Serialize the policy like any other (custom) data.
return default_to_dict(self,
device=self.device.to_dict() if self.device else None,
...)

@classmethod
def from_dict(cls, data):
# Deserialize the device data inplace before passing
# Deserialize the device data inplace before passing
# it to the generic from_dict function.
init_params = data["init_parameters"]
init_params["device"] = ComponentDevice.from_dict(init_params["device"])
Expand Down Expand Up @@ -120,4 +119,4 @@ generator = HuggingFaceLocalGenerator(model="llama2", huggingface_pipeline_kwarg
})
```

In such cases, ensure that the parameter precedence and selection behavior is clearly documented. In the case of `HuggingFaceLocalGenerator`, the device map passed through the `huggingface_pipeline_kwargs` parameter overrides the explicit `device` parameter and is documented as such.
In such cases, ensure that the parameter precedence and selection behavior is clearly documented. In the case of `HuggingFaceLocalGenerator`, the device map passed through the `huggingface_pipeline_kwargs` parameter overrides the explicit `device` parameter and is documented as such.
19 changes: 8 additions & 11 deletions docs-website/docs/concepts/document-store.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,14 @@ description: "You can think of the Document Store as a database that stores your

You can think of the Document Store as a database that stores your data and provides them to the Retriever at query time. Learn how to use Document Store in a pipeline or how to create your own.

Document Store is an object that stores your documents. In Haystack, a Document Store is different from a component, as it doesnt have the `run()` method. You can think of it as an interface to your database – you put the information there, or you can look through it. This means that a Document Store is not a piece of a pipeline but rather a tool that the components of a pipeline have access to and can interact with.
Document Store is an object that stores your documents. In Haystack, a Document Store is different from a component, as it doesn't have the `run()` method. You can think of it as an interface to your database – you put the information there, or you can look through it. This means that a Document Store is not a piece of a pipeline but rather a tool that the components of a pipeline have access to and can interact with.

:::tip
Work with Retrievers
:::tip Work with Retrievers

The most common way to use a Document Store in Haystack is to fetch documents using a Retriever. A Document Store will often have a corresponding Retriever to get the most out of specific technologies. See more information in our [Retriever](../pipeline-components/retrievers.mdx) documentation.
:::

:::info
How to choose a Document Store?
:::note How to choose a Document Store?

To learn about different types of Document Stores and their strengths and disadvantages, head to the [Choosing a Document Store](document-store/choosing-a-document-store.mdx) page.
:::
Expand All @@ -40,7 +38,7 @@ See the installation and initialization details for each Document Store in the "

### Work with Documents

Convert your data into `Document` objects before writing them into a Document Store along with its metadata and document ID.
Convert your data into `Document` objects before writing them into a Document Store along with its metadata and document ID.

The ID field is mandatory, so if you don’t choose a specific ID yourself, Haystack will do its best to come up with a unique ID based on the document’s information and assign it automatically. However, since Haystack uses the document’s contents to create an ID, two identical documents might have identical IDs. Keep it in mind as you update your documents, as the ID will not be updated automatically.

Expand All @@ -61,14 +59,13 @@ To write documents into the `InMemoryDocumentStore`, simply call the `.write_doc

```python
document_store.write_documents([
Document(content="My name is Jean and I live in Paris."),
Document(content="My name is Mark and I live in Berlin."),
Document(content="My name is Jean and I live in Paris."),
Document(content="My name is Mark and I live in Berlin."),
Document(content="My name is Giorgio and I live in Rome.")
])
```

:::info
`DocumentWriter`
:::note `DocumentWriter`

See `DocumentWriter` component [docs](../pipeline-components/writers/documentwriter.mdx) to write your documents into a Document Store in a pipeline.
:::
Expand Down Expand Up @@ -100,4 +97,4 @@ The `init` function should indicate all the specifics for the chosen database or

We also recommend having a custom corresponding Retriever to get the most out of a specific Document Store.

See [Creating Custom Document Stores](document-store/creating-custom-document-stores.mdx) page for more details.
See [Creating Custom Document Stores](document-store/creating-custom-document-stores.mdx) page for more details.
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,10 @@ Continue further down the article for a more complex explanation of the strength

Vector libraries are often included in the “vector database” category improperly, as they are limited to handling only vectors, are designed to work in-memory, and normally don’t have a clean way to store data on disk. Still, they are the way to go every time performance and speed are the top requirements for your AI application, as these libraries can use hardware resources very effectively.

> 🚧 In progress
>
> We are currently developing the support for vector libraries in Haystack.
:::warning In progress

We are currently developing the support for vector libraries in Haystack.
:::

#### Pure vector databases

Expand Down
3 changes: 1 addition & 2 deletions docs-website/docs/concepts/pipelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,7 @@ Thanks to serialization, you can save and then load your pipelines. Serializatio

Haystack pipelines delegate the serialization to its components, so serializing a pipeline simply means serializing each component in the pipeline one after the other, along with their connections. The pipeline is serialized into a dictionary format, which acts as an intermediate format that you can then convert into the final format you want.

:::info
Serialization formats
:::info Serialization formats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in most other cases in the document store mdx that you converted :::info into :::note. Although I can't see any difference between those two in the preview. Is there one we should be using or does it not matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question @sjrl, essentially the idea is that these three categories have distinct meanings:
:::note - Key details that are important for users to understand (essential information)
:::tip - Optional advice or suggestions that can enhance experience but aren't mandatory
:::info - Useful information that adds value but isn't critical
And your comment made me start looking at concrete examples and finding out discrepancies :) I'll add another commit to standardize the admonition type usage!


Haystack only supports YAML format at this time. We'll be rolling out more formats gradually.
:::
Expand Down
13 changes: 6 additions & 7 deletions docs-website/docs/concepts/pipelines/serialization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,9 @@ description: "Save your pipelines into a custom format and explore the serializa

Save your pipelines into a custom format and explore the serialization options.

Serialization means converting a pipeline to a format that you can save on your disk and load later.
Serialization means converting a pipeline to a format that you can save on your disk and load later.

:::info
Serialization formats
:::info Serialization formats

Haystack 2.0 only supports YAML format at this time. We will be rolling out more formats gradually.
:::
Expand All @@ -28,7 +27,7 @@ pipe = Pipeline()
print(pipe.dumps())

## Prints:
##
##
## components: {}
## connections: []
## max_loops_allowed: 100
Expand Down Expand Up @@ -131,7 +130,7 @@ from haystack import component, default_from_dict, default_to_dict
class SetIntersector:
def __init__(self, intersect_with: set):
self.intersect_with = intersect_with

@component.output_types(result=set)
def run(self, data: set):
return data.intersect(self.intersect_with)
Expand All @@ -151,7 +150,7 @@ class SetIntersector:

Once a pipeline is available in its dictionary format, the last step of serialization is to convert that dictionary into a format you can store or send over the wire. Haystack supports YAML out of the box, but if you need a different format, you can write a custom Marshaller.

A `Marshaller` is a Python class responsible for converting text to a dictionary and a dictionary to text according to a certain format. Marshallers must respect the `Marshaller` [protocol](https://github.com/deepset-ai/haystack/blob/main/haystack/marshal/protocol.py), providing the methods `marshal` and `unmarshal`.
A `Marshaller` is a Python class responsible for converting text to a dictionary and a dictionary to text according to a certain format. Marshallers must respect the `Marshaller` [protocol](https://github.com/deepset-ai/haystack/blob/main/haystack/marshal/protocol.py), providing the methods `marshal` and `unmarshal`.

This is the code for a custom TOML marshaller that relies on the `rtoml` library:

Expand Down Expand Up @@ -182,4 +181,4 @@ pipe.dumps(TomlMarshaller())

## Additional References

:notebook: Tutorial: [Serializing LLM Pipelines](https://haystack.deepset.ai/tutorials/29_serializing_pipelines)
:notebook: Tutorial: [Serializing LLM Pipelines](https://haystack.deepset.ai/tutorials/29_serializing_pipelines)
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@ You can visualize your pipelines as graphs to better understand how the componen

Haystack pipelines have `draw()` and `show()` methods that enable you to visualize the pipeline as a graph using Mermaid graphs.

:::info
Data Privacy Notice
:::note Data Privacy Notice

Exercise caution with sensitive data when using pipeline visualization.

Expand Down
7 changes: 3 additions & 4 deletions docs-website/docs/development/deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: "Deploy your Haystack pipelines through various services such as Do

Deploy your Haystack pipelines through various services such as Docker, Kubernetes, Ray, or a variety of Serverless options.

As a framework, Haystack is typically integrated into a variety of applications and environments, and there is no single, specific deployment strategy to follow. However, it is very common to make Haystack pipelines accessible through a service that can be easily called from other software systems.
As a framework, Haystack is typically integrated into a variety of applications and environments, and there is no single, specific deployment strategy to follow. However, it is very common to make Haystack pipelines accessible through a service that can be easily called from other software systems.

These guides focus on tools and techniques that can be used to run Haystack pipelines in common scenarios. While these suggestions should not be considered the only way to do so, they should provide inspiration and the ability to customize them according to your needs.

Expand All @@ -25,12 +25,11 @@ Here are the currently available guides on Haystack pipeline deployment:

Haystack can be easily integrated into any HTTP application, but if you don’t have one, you can use Hayhooks, a ready-made application that serves Haystack pipelines as REST endpoints. We’ll be using Hayhooks throughout this guide to streamline the code examples. Refer to the Hayhooks [documentation](hayhooks.mdx) to get details about how to run the server and deploy your pipelines.

:::note
Looking to scale with confidence?
:::note Looking to scale with confidence?

If your team needs **enterprise-grade support, best practices, and deployment guidance** to run Haystack in production, check out **Haystack Enterprise**.

📜 [Learn more about Haystack Enterprise](https://haystack.deepset.ai/blog/announcing-haystack-enterprise)

👉 [Get in touch with our team](https://www.deepset.ai/products-and-services/haystack-enterprise)
:::
:::
9 changes: 4 additions & 5 deletions docs-website/docs/development/hayhooks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ description: "Hayhooks is a web application you can use to serve Haystack pipeli

Hayhooks is a web application you can use to serve Haystack pipelines through HTTP endpoints. This page provides an overview of the main features of Hayhooks.

:::info
Hayhooks GitHub
:::info Hayhooks GitHub

You can find the code and an in-depth explanation of the features in the [Hayhooks GitHub repository](https://github.com/deepset-ai/hayhooks).
:::
Expand Down Expand Up @@ -238,10 +237,10 @@ To deploy a pipeline without listing it as an MCP Tool, set `skip_mcp = True` in
class PipelineWrapper(BasePipelineWrapper):
# This will skip the MCP Tool listing
skip_mcp = True

def setup(self) -> None:
...

def run_api(self, urls: List[str], question: str) -> str:
...
```
Expand Down Expand Up @@ -298,4 +297,4 @@ async def custom_middleware(request: Request, call_next):

if __name__ == "__main__":
uvicorn.run("app:hayhooks", host=settings.host, port=settings.port)
```
```
3 changes: 1 addition & 2 deletions docs-website/docs/development/logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ If Haystack detects a [structlog installation](https://www.structlog.org/en/stab
To make development a more pleasurable experience, Haystack uses [structlog’s `ConsoleRender`](https://www.structlog.org/en/stable/console-output.html) by default to render structured logs as a nicely aligned and colorful output:
<ClickableImage src="/img/e49a1f2-Screenshot_2024-02-27_at_16.13.51.png" alt="Python code snippet demonstrating basic logging setup with getLogger and a warning level log message output" />

:::tip
Rich Formatting
:::tip Rich Formatting

Install [_rich_](https://rich.readthedocs.io/en/stable/index.html) to beautify your logs even more!
:::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ A Document Store for storing and retrieval from Azure AI Search Index.

[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) is an enterprise-ready search and retrieval system to build RAG-based applications on Azure, with native LLM integrations.

`AzureAISearchDocumentStore` supports semantic reranking and metadata/content filtering. The Document Store is useful for various tasks such as generating knowledge base insights (catalog or document search), information discovery (data exploration), RAG, and automation.
`AzureAISearchDocumentStore` supports semantic reranking and metadata/content filtering. The Document Store is useful for various tasks such as generating knowledge base insights (catalog or document search), information discovery (data exploration), RAG, and automation.

### Initialization

Expand Down Expand Up @@ -46,8 +46,7 @@ document_store.write_documents([
print(document_store.count_documents())
```

:::info
Latency Notice
:::info Latency Notice

Due to Azure search index latency, the document count returned in the example might be zero if executed immediately. To ensure accurate results, be mindful of this latency when retrieving documents from the search index.
:::
Expand All @@ -60,4 +59,4 @@ The Haystack Azure AI Search integration includes three Retriever components. Ea

- [`AzureAISearchEmbeddingRetriever`](../pipeline-components/retrievers/azureaisearchembeddingretriever.mdx): This Retriever accepts the embeddings of a single query as input and returns a list of matching documents. The query must be embedded beforehand, which can be done using an [Embedder](../pipeline-components/embedders.mdx) component.
- [`AzureAISearchBM25Retriever`](../pipeline-components/retrievers/azureaisearchbm25retriever.mdx): A keyword-based Retriever that retrieves documents matching a query from the Azure AI Search index.
- [`AzureAISearchHybridRetriever`](../pipeline-components/retrievers/azureaisearchhybridretriever.mdx): This Retriever combines embedding-based retrieval and keyword search to find matching documents in the search index to get more relevant results.
- [`AzureAISearchHybridRetriever`](../pipeline-components/retrievers/azureaisearchhybridretriever.mdx): This Retriever combines embedding-based retrieval and keyword search to find matching documents in the search index to get more relevant results.
21 changes: 10 additions & 11 deletions docs-website/docs/document-stores/qdrant-document-store.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Use the Qdrant vector database with Haystack.
| API reference | [Qdrant](/reference/integrations-qdrant) |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant |

Qdrant is a powerful high-performance, massive-scale vector database. The `QdrantDocumentStore` can be used with any Qdrant instance, in-memory, locally persisted, hosted, and the official Qdrant Cloud.
Qdrant is a powerful high-performance, massive-scale vector database. The `QdrantDocumentStore` can be used with any Qdrant instance, in-memory, locally persisted, hosted, and the official Qdrant Cloud.

### Installation

Expand All @@ -39,15 +39,16 @@ document_store = QdrantDocumentStore(
wait_result_from_api=True,
)
document_store.write_documents([
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5])
])
print(document_store.count_documents())
```

> 🚧 Collections Created Outside Haystack
>
> When you create a `QdrantDocumentStore` instance, Haystack takes care of setting up the collection. In general, you cannot use a Qdrant collection created without Haystack with Haystack. If you want to migrate your existing collection, see the sample script at https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/qdrant/src/haystack_integrations/document_stores/qdrant/migrate_to_sparse.py.
:::warning Collections Created Outside Haystack

When you create a `QdrantDocumentStore` instance, Haystack takes care of setting up the collection. In general, you cannot use a Qdrant collection created without Haystack with Haystack. If you want to migrate your existing collection, see the sample script at https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/qdrant/src/haystack_integrations/document_stores/qdrant/migrate_to_sparse.py.
:::

You can also connect directly to [Qdrant Cloud](https://cloud.qdrant.io/login) directly. Once you have your API key and your cluster URL from the Qdrant dashboard, you can connect like this:

Expand All @@ -65,14 +66,13 @@ document_store = QdrantDocumentStore(
)

document_store.write_documents([
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5])
])
print(document_store.count_documents())
```

:::tip
More information
:::tip More information

You can find more ways to initialize and use QdrantDocumentStore on our [integration page](https://haystack.deepset.ai/integrations/qdrant-document-store).
:::
Expand All @@ -83,8 +83,7 @@ You can find more ways to initialize and use QdrantDocumentStore on our [integra
- [`QdrantSparseEmbeddingRetriever`](../pipeline-components/retrievers/qdrantsparseembeddingretriever.mdx): Retrieves documents from the `QdrantDocumentStore` based on their sparse embeddings.
- [`QdrantHybridRetriever`](../pipeline-components/retrievers/qdranthybridretriever.mdx): Retrieves documents from the `QdrantDocumentStore` based on both dense and sparse embeddings.

:::info
Sparse Embedding Support
:::note Sparse Embedding Support

To use Sparse Embedding support, you need to initialize the `QdrantDocumentStore` with `use_sparse_embeddings=True`, which is `False` by default.

Expand All @@ -93,4 +92,4 @@ If you want to use Document Store or collection previously created with this fea

## Additional References

:cook: Cookbook: [Sparse Embedding Retrieval with Qdrant and FastEmbed](https://haystack.deepset.ai/cookbook/sparse_embedding_retrieval)
:cook: Cookbook: [Sparse Embedding Retrieval with Qdrant and FastEmbed](https://haystack.deepset.ai/cookbook/sparse_embedding_retrieval)
Loading
Loading