Skip to content

Commit

Permalink
Merge branch 'main' into issue-7848b
Browse files Browse the repository at this point in the history
  • Loading branch information
CarlosFerLo committed Jul 23, 2024
2 parents 9458882 + 0c9dc00 commit 0a8184b
Show file tree
Hide file tree
Showing 119 changed files with 2,656 additions and 1,019 deletions.
2 changes: 1 addition & 1 deletion .github/config/pypi-release-slack-notification.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ pretext: Triggered via {{eventName}} of {{env.VERSION}} by {{actor}}
title: "Haystack PyPi release"

text: |
<https://pypi.org/project/farm-haystack/{{env.VERSION}}/|PyPi release {{env.VERSION}}>
<https://pypi.org/project/haystack-ai/{{env.VERSION}}/|PyPi release {{env.VERSION}}>
{{#if (eq jobStatus "SUCCESS")}}
Haystack {{env.VERSION}} has been released on PyPi :rocket:
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/workflows_linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Github workflows linter
on:
pull_request:
paths:
- ".github/workflows"
- ".github/workflows/**"

jobs:
lint-workflows:
Expand All @@ -12,8 +12,12 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- uses: actions/setup-go@v5

- name: Install actionlint
run: go install github.com/rhysd/actionlint/cmd/actionlint@latest

- name: Run actionlint
env:
SHELLCHECK_OPTS: --exclude=SC2102
run: actionlint
60 changes: 30 additions & 30 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
fail_fast: true

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast # checks Python syntax
- id: check-json # checks JSON syntax
- id: check-merge-conflict # checks for no merge conflict strings
- id: check-shebang-scripts-are-executable # checks all shell scripts have executable permissions
- id: check-toml # checks TOML syntax
- id: check-yaml # checks YAML syntax
- id: end-of-file-fixer # checks there is a newline at the end of the file
- id: mixed-line-ending # normalizes line endings
- id: no-commit-to-branch # prevents committing to main
- id: trailing-whitespace # trims trailing whitespace
args: [--markdown-linebreak-ext=md]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: check-ast # checks Python syntax
- id: check-json # checks JSON syntax
- id: check-merge-conflict # checks for no merge conflict strings
- id: check-shebang-scripts-are-executable # checks all shell scripts have executable permissions
- id: check-toml # checks TOML syntax
- id: check-yaml # checks YAML syntax
- id: end-of-file-fixer # checks there is a newline at the end of the file
- id: mixed-line-ending # normalizes line endings
- id: no-commit-to-branch # prevents committing to main
- id: trailing-whitespace # trims trailing whitespace
args: [--markdown-linebreak-ext=md]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.0
hooks:
- id: ruff
- id: ruff-format
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.0
hooks:
- id: ruff
- id: ruff-format

- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
hooks:
- id: codespell
additional_dependencies:
- tomli
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
additional_dependencies:
- tomli

- repo: https://github.com/rhysd/actionlint
rev: v1.6.25
hooks:
- id: actionlint-docker
args: ["-ignore", "SC2102"]
- repo: https://github.com/rhysd/actionlint
rev: v1.7.1
hooks:
- id: actionlint-docker
args: ["-ignore", "SC2102"]
33 changes: 28 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,20 +222,43 @@ options you would normally pass to `pytest`, for example:
hatch run test:unit test/test_logging.py::TestSkipLoggingConfiguration::test_skip_logging_configuration
```

### Run code quality checks locally

We also use tools to ensure consistent code style, quality, and static type checking. The quality of your code will be
tested by the CI, but once again, running the checks locally will speed up the review cycle. To check your code you
can run:
tested by the CI, but once again, running the checks locally will speed up the review cycle.


To check your code type checking, run:
```sh
hatch run test:lint
hatch run test:type
```

If the linters spot any error, you can fix it before checking in your code:

To check your code format run:
```sh
hatch run test:lint-fix
hatch run format-check
```


To format your code, you can run:
```sh
hatch run format
````


To check your code style according to linting rules run:
```sh
hatch run check
hatch run test:lint
````
If the linters spot any error, you can fix it before checking in your code:
```sh
hatch run fix
```


## Requirements for Pull Requests

To ease the review process, please follow the instructions in this paragraph when creating a Pull Request:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<div align="center">
<a href="https://haystack.deepset.ai/"><img src="https://github.com/deepset-ai/haystack/blob/main/docs/img/banner_20.png" alt="Green logo of a stylized white 'H' with the text 'Haystack, by deepset. Haystack 2.0 is live 🎉' Abstract green and yellow diagrams in the background."></a>
<a href="https://haystack.deepset.ai/"><img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/banner_20.png" alt="Green logo of a stylized white 'H' with the text 'Haystack, by deepset. Haystack 2.0 is live 🎉' Abstract green and yellow diagrams in the background."></a>

| | |
| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| CI/CD | [![Tests](https://github.com/deepset-ai/haystack/actions/workflows/tests.yml/badge.svg)](https://github.com/deepset-ai/haystack/actions/workflows/tests.yml) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![Coverage Status](https://coveralls.io/repos/github/deepset-ai/haystack/badge.svg?branch=main)](https://coveralls.io/github/deepset-ai/haystack?branch=main) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) |
| Docs | [![Website](https://img.shields.io/website?label=documentation&up_message=online&url=https%3A%2F%2Fdocs.haystack.deepset.ai)](https://docs.haystack.deepset.ai) |
| Package | [![PyPI](https://img.shields.io/pypi/v/haystack-ai)](https://pypi.org/project/haystack-ai/) ![PyPI - Downloads](https://img.shields.io/pypi/dm/haystack-ai?color=blue&logo=pypi&logoColor=gold) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/haystack-ai?logo=python&logoColor=gold) [![Conda Version](https://img.shields.io/conda/vn/conda-forge/haystack-ai.svg)](https://anaconda.org/conda-forge/haystack-ai) [![GitHub](https://img.shields.io/github/license/deepset-ai/haystack?color=blue)](LICENSE) [![License Compliance](https://github.com/deepset-ai/haystack/actions/workflows/license_compliance.yml/badge.svg)](https://github.com/deepset-ai/haystack/actions/workflows/license_compliance.yml) |
| Meta | [![Discord](https://img.shields.io/discord/993534733298450452?logo=discord)](https://discord.gg/haystack) [![Twitter Follow](https://img.shields.io/twitter/follow/haystack_ai)](https://twitter.com/haystack_ai) |
| Meta | [![Discord](https://img.shields.io/discord/993534733298450452?logo=discord)](https://discord.com/invite/VBpFzsgRVF) [![Twitter Follow](https://img.shields.io/twitter/follow/haystack_ai)](https://twitter.com/haystack_ai) |
</div>

[Haystack](https://haystack.deepset.ai/) is an end-to-end LLM framework that allows you to build applications powered by
Expand Down
2 changes: 1 addition & 1 deletion VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.3.0-rc0
2.4.0-rc0
27 changes: 0 additions & 27 deletions docs/pydoc/config/others_api.yml

This file was deleted.

1 change: 1 addition & 0 deletions docs/pydoc/config/retrievers_api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ loaders:
"in_memory/bm25_retriever",
"in_memory/embedding_retriever",
"filter_retriever",
"sentence_window_retrieval",
]
ignore_when_discovered: ["__init__"]
processors:
Expand Down
37 changes: 23 additions & 14 deletions haystack/components/audio/whisper_local.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,20 @@


logger = logging.getLogger(__name__)
WhisperLocalModel = Literal["tiny", "small", "medium", "large", "large-v2"]
WhisperLocalModel = Literal[
"base",
"base.en",
"large",
"large-v1",
"large-v2",
"large-v3",
"medium",
"medium.en",
"small",
"small.en",
"tiny",
"tiny.en",
]


@component
Expand Down Expand Up @@ -91,7 +104,7 @@ def from_dict(cls, data: Dict[str, Any]) -> "LocalWhisperTranscriber":
The deserialized component.
"""
init_params = data["init_parameters"]
if init_params["device"] is not None:
if init_params.get("device") is not None:
init_params["device"] = ComponentDevice.from_dict(init_params["device"])
return default_from_dict(cls, data)

Expand Down Expand Up @@ -161,25 +174,21 @@ def _raw_transcribe(self, sources: List[Union[str, Path, ByteStream]], **kwargs)
raise RuntimeError("Model is not loaded, please run 'warm_up()' before calling 'run()'")

return_segments = kwargs.pop("return_segments", False)
transcriptions: Dict[Path, Any] = {}
transcriptions = {}

for source in sources:
if not isinstance(source, ByteStream):
path = Path(source)
source = ByteStream.from_file_path(path)
source.meta["file_path"] = path
else:
# If we received a ByteStream instance that doesn't have the "file_path" metadata set,
# we dump the bytes into a temporary file.
path = source.meta.get("file_path")
if path is None:
fp = tempfile.NamedTemporaryFile(delete=False)
path = Path(source) if not isinstance(source, ByteStream) else source.meta.get("file_path")

if isinstance(source, ByteStream) and path is None:
with tempfile.NamedTemporaryFile(delete=False) as fp:
path = Path(fp.name)
source.to_file(path)
source.meta["file_path"] = path

transcription = self._model.transcribe(str(path), **kwargs)

if not return_segments:
transcription.pop("segments", None)

transcriptions[path] = transcription

return transcriptions
20 changes: 12 additions & 8 deletions haystack/components/builders/answer_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
# SPDX-License-Identifier: Apache-2.0

import re
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Union

from haystack import Document, GeneratedAnswer, component, logging
from haystack.dataclasses.chat_message import ChatMessage

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -56,7 +57,7 @@ def __init__(self, pattern: Optional[str] = None, reference_pattern: Optional[st
def run(
self,
query: str,
replies: List[str],
replies: Union[List[str], List[ChatMessage]],
meta: Optional[List[Dict[str, Any]]] = None,
documents: Optional[List[Document]] = None,
pattern: Optional[str] = None,
Expand All @@ -68,7 +69,7 @@ def run(
:param query:
The query used in the prompts for the Generator.
:param replies:
The output of the Generator.
The output of the Generator. Can be a list of strings or a list of ChatMessage objects.
:param meta:
The metadata returned by the Generator. If not specified, the generated answer will contain no metadata.
:param documents:
Expand Down Expand Up @@ -103,14 +104,15 @@ def run(

pattern = pattern or self.pattern
reference_pattern = reference_pattern or self.reference_pattern

all_answers = []
for reply, metadata in zip(replies, meta):
# Extract content from ChatMessage objects if reply is a ChatMessages, else use the string as is
extracted_reply: str = reply.content if isinstance(reply, ChatMessage) else reply # type: ignore
extracted_metadata = reply.meta if isinstance(reply, ChatMessage) else metadata
referenced_docs = []
if documents:
reference_idxs = []
if reference_pattern:
reference_idxs = AnswerBuilder._extract_reference_idxs(reply, reference_pattern)
reference_idxs = AnswerBuilder._extract_reference_idxs(extracted_reply, reference_pattern)
else:
reference_idxs = [doc_idx for doc_idx, _ in enumerate(documents)]

Expand All @@ -122,8 +124,10 @@ def run(
"Document index '{index}' referenced in Generator output is out of range. ", index=idx + 1
)

answer_string = AnswerBuilder._extract_answer_string(reply, pattern)
answer = GeneratedAnswer(data=answer_string, query=query, documents=referenced_docs, meta=metadata)
answer_string = AnswerBuilder._extract_answer_string(extracted_reply, pattern)
answer = GeneratedAnswer(
data=answer_string, query=query, documents=referenced_docs, meta=extracted_metadata
)
all_answers.append(answer)

return {"answers": all_answers}
Expand Down
20 changes: 7 additions & 13 deletions haystack/components/caching/cache_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
#
# SPDX-License-Identifier: Apache-2.0

import importlib
from typing import Any, Dict, List

from haystack import DeserializationError, Document, component, default_from_dict, default_to_dict, logging
from haystack.core.serialization import import_class_by_name
from haystack.document_stores.types import DocumentStore

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -77,19 +77,13 @@ def from_dict(cls, data: Dict[str, Any]) -> "CacheChecker":
if "type" not in init_params["document_store"]:
raise DeserializationError("Missing 'type' in document store's serialization data")

doc_store_data = data["init_parameters"]["document_store"]
try:
module_name, type_ = init_params["document_store"]["type"].rsplit(".", 1)
logger.debug("Trying to import module '{module_name}'", module_name=module_name)
module = importlib.import_module(module_name)
except (ImportError, DeserializationError) as e:
raise DeserializationError(
f"DocumentStore of type '{init_params['document_store']['type']}' not correctly imported"
) from e

docstore_class = getattr(module, type_)
docstore = docstore_class.from_dict(init_params["document_store"])

data["init_parameters"]["document_store"] = docstore
doc_store_class = import_class_by_name(doc_store_data["type"])
except ImportError as e:
raise DeserializationError(f"Class '{doc_store_data['type']}' not correctly imported") from e
data["init_parameters"]["document_store"] = default_from_dict(doc_store_class, doc_store_data)

return default_from_dict(cls, data)

@component.output_types(hits=List[Document], misses=List)
Expand Down
6 changes: 3 additions & 3 deletions haystack/components/converters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# SPDX-License-Identifier: Apache-2.0

from haystack.components.converters.azure import AzureOCRDocumentConverter
from haystack.components.converters.docx import DocxMetadata, DocxToDocument
from haystack.components.converters.docx import DOCXMetadata, DOCXToDocument
from haystack.components.converters.html import HTMLToDocument
from haystack.components.converters.markdown import MarkdownToDocument
from haystack.components.converters.openapi_functions import OpenAPIServiceToFunctions
Expand All @@ -23,6 +23,6 @@
"MarkdownToDocument",
"OpenAPIServiceToFunctions",
"OutputAdapter",
"DocxToDocument",
"DocxMetadata",
"DOCXToDocument",
"DOCXMetadata",
]
3 changes: 2 additions & 1 deletion haystack/components/converters/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,8 @@ def run(self, sources: List[Union[str, Path, ByteStream]], meta: Optional[List[D
result = poller.result()
azure_output.append(result.to_dict())

docs = self._convert_tables_and_text(result=result, meta=metadata)
merged_metadata = {**bytestream.meta, **metadata}
docs = self._convert_tables_and_text(result=result, meta=merged_metadata)
documents.extend(docs)

return {"documents": documents, "raw_azure_response": azure_output}
Expand Down
Loading

0 comments on commit 0a8184b

Please sign in to comment.