Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up docstrings: HuggingFaceAPIDocumentEmbedder & HuggingFaceAPITextEmbedder #8184

Merged
merged 3 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,18 @@
@component
class HuggingFaceAPIDocumentEmbedder:
"""
A component that embeds documents using Hugging Face APIs.
Embeds documents using Hugging Face APIs.

This component can be used to compute Document embeddings using different Hugging Face APIs:
Use it with the following Hugging Face APIs:
- [Free Serverless Inference API]((https://huggingface.co/inference-api)
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
- [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)


Example usage with the free Serverless Inference API:
### Usage examples

#### With free serverless inference API

```python
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
Expand All @@ -49,7 +52,8 @@ class HuggingFaceAPIDocumentEmbedder:
# [0.017020374536514282, -0.023255806416273117, ...]
```

Example usage with paid Inference Endpoints:
#### With paid inference endpoints

```python
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
Expand All @@ -67,7 +71,8 @@ class HuggingFaceAPIDocumentEmbedder:
# [0.017020374536514282, -0.023255806416273117, ...]
```

Example usage with self-hosted Text Embeddings Inference:
#### With self-hosted text embeddings inference

```python
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.dataclasses import Document
Expand Down Expand Up @@ -99,41 +104,41 @@ def __init__(
embedding_separator: str = "\n",
):
"""
Create an HuggingFaceAPITextEmbedder component.
Creates a HuggingFaceAPIDocumentEmbedder component.

:param api_type:
The type of Hugging Face API to use.
:param api_params:
A dictionary containing the following keys:
- `model`: model ID on the Hugging Face Hub. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
A dictionary with the following keys:
- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
- `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
`TEXT_EMBEDDINGS_INFERENCE`.
:param token: The HuggingFace token to use as HTTP bearer authorization.
You can find your HF token in your [account settings](https://huggingface.co/settings/tokens).
`TEXT_EMBEDDINGS_INFERENCE`.
:param token: The Hugging Face token to use as HTTP bearer authorization.
Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
:param prefix:
A string to add at the beginning of each text.
:param suffix:
A string to add at the end of each text.
:param truncate:
Truncate input text from the end to the maximum length supported by the model.
This parameter takes effect when the `api_type` is `TEXT_EMBEDDINGS_INFERENCE`.
It also takes effect when the `api_type` is `INFERENCE_ENDPOINTS` and the backend is based on Text
Embeddings Inference. This parameter is ignored when the `api_type` is `SERVERLESS_INFERENCE_API`
(it is always set to `True` and cannot be changed).
Truncates the input text to the maximum length supported by the model.
Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
if the backend uses Text Embeddings Inference.
If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
It is always set to `True` and cannot be changed.
:param normalize:
Normalize the embeddings to unit length.
This parameter takes effect when the `api_type` is `TEXT_EMBEDDINGS_INFERENCE`.
It also takes effect when the `api_type` is `INFERENCE_ENDPOINTS` and the backend is based on Text
Embeddings Inference. This parameter is ignored when the `api_type` is `SERVERLESS_INFERENCE_API`
(it is always set to `False` and cannot be changed).
Normalizes the embeddings to unit length.
Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
if the backend uses Text Embeddings Inference.
If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
It is always set to `False` and cannot be changed.
:param batch_size:
Number of Documents to process at once.
Number of documents to process at once.
:param progress_bar:
If `True` shows a progress bar when running.
If `True`, shows a progress bar when running.
:param meta_fields_to_embed:
List of meta fields that will be embedded along with the Document text.
List of meta fields that will be embedded along with the document text.
dfokina marked this conversation as resolved.
Show resolved Hide resolved
:param embedding_separator:
Separator used to concatenate the meta fields to the Document text.
Separator used to concatenate the meta fields to the document text.
dfokina marked this conversation as resolved.
Show resolved Hide resolved
"""
huggingface_hub_import.check()

Expand Down Expand Up @@ -252,14 +257,14 @@ def _embed_batch(self, texts_to_embed: List[str], batch_size: int) -> List[List[
@component.output_types(documents=List[Document])
def run(self, documents: List[Document]):
"""
Embed a list of Documents.
Embeds a list of documents.

:param documents:
Documents to embed.

:returns:
A dictionary with the following keys:
- `documents`: Documents with embeddings
- `documents`: Documents with embeddings.
dfokina marked this conversation as resolved.
Show resolved Hide resolved
"""
if not isinstance(documents, list) or documents and not isinstance(documents[0], Document):
raise TypeError(
Expand Down
48 changes: 26 additions & 22 deletions haystack/components/embedders/hugging_face_api_text_embedder.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,17 @@
@component
class HuggingFaceAPITextEmbedder:
"""
A component that embeds text using Hugging Face APIs.
Embeds strings using Hugging Face APIs.

This component can be used to embed strings using different Hugging Face APIs:
Use it with the following Hugging Face APIs:
- [Free Serverless Inference API]((https://huggingface.co/inference-api)
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
- [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)

### Usage examples

#### With free serverless inference API

Example usage with the free Serverless Inference API:
```python
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
Expand All @@ -42,7 +44,8 @@ class HuggingFaceAPITextEmbedder:
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
```

Example usage with paid Inference Endpoints:
#### With paid inference endpoints

```python
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
Expand All @@ -55,7 +58,8 @@ class HuggingFaceAPITextEmbedder:
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
```

Example usage with self-hosted Text Embeddings Inference:
#### With self-hosted text embeddings inference

```python
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
Expand All @@ -80,33 +84,33 @@ def __init__(
normalize: bool = False,
):
"""
Create an HuggingFaceAPITextEmbedder component.
Creates a HuggingFaceAPITextEmbedder component.

:param api_type:
The type of Hugging Face API to use.
:param api_params:
A dictionary containing the following keys:
- `model`: model ID on the Hugging Face Hub. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
A dictionary with the following keys:
- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
- `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
`TEXT_EMBEDDINGS_INFERENCE`.
:param token: The HuggingFace token to use as HTTP bearer authorization
You can find your HF token in your [account settings](https://huggingface.co/settings/tokens)
`TEXT_EMBEDDINGS_INFERENCE`.
:param token: The Hugging Face token to use as HTTP bearer authorization.
Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
:param prefix:
A string to add at the beginning of each text.
:param suffix:
A string to add at the end of each text.
:param truncate:
Truncate input text from the end to the maximum length supported by the model.
This parameter takes effect when the `api_type` is `TEXT_EMBEDDINGS_INFERENCE`.
It also takes effect when the `api_type` is `INFERENCE_ENDPOINTS` and the backend is based on Text
Embeddings Inference. This parameter is ignored when the `api_type` is `SERVERLESS_INFERENCE_API`
(it is always set to `True` and cannot be changed).
Truncates the input text to the maximum length supported by the model.
Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
if the backend uses Text Embeddings Inference.
If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
It is always set to `True` and cannot be changed.
:param normalize:
Normalize the embeddings to unit length.
This parameter takes effect when the `api_type` is `TEXT_EMBEDDINGS_INFERENCE`.
It also takes effect when the `api_type` is `INFERENCE_ENDPOINTS` and the backend is based on Text
Embeddings Inference. This parameter is ignored when the `api_type` is `SERVERLESS_INFERENCE_API`
(it is always set to `False` and cannot be changed).
Normalizes the embeddings to unit length.
Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
if the backend uses Text Embeddings Inference.
If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
It is always set to `False` and cannot be changed.
"""
huggingface_hub_import.check()

Expand Down Expand Up @@ -179,7 +183,7 @@ def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPITextEmbedder":
@component.output_types(embedding=List[float])
def run(self, text: str):
"""
Embed a single string.
Embeds a single string.

:param text:
Text to embed.
Expand Down