Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): Add limited langchain support for Anthropic, Cohere, and OpenAI models #434

Merged
merged 23 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8b6c34c
Add openai datasets.
ianwhale Nov 16, 2023
8a3bdfb
Add anthropic and cohere
ianwhale Nov 16, 2023
a3de75d
Add python API examples to docstrings.
ianwhale Nov 27, 2023
39dd5ae
Clean up python example.
ianwhale Apr 30, 2024
b38786f
Merge branch 'main' into feat/langchain-dataset
merelcht May 21, 2024
72cf548
Remove setup.py and move lanchain reqs to pyproject.toml
merelcht May 21, 2024
de2596b
Move lanchain datasets to experimental
merelcht May 21, 2024
b67c43f
Try get antrophic dataset running. Looks like API URL is not necessary?
merelcht May 21, 2024
0fab1f6
Update cohere package and imports
merelcht May 21, 2024
5e059f3
Merge branch 'main' into feat/langchain-dataset
merelcht May 21, 2024
6d9ba95
Update openai dependency + allow for url in antrophic
merelcht May 21, 2024
4d60267
Merge branch 'feat/langchain-dataset' of https://github.com/ianwhale/…
merelcht May 21, 2024
05d8573
Improve Cohere dataset
merelcht May 29, 2024
82de16a
Make credentials consistent + fix openai examples
merelcht May 29, 2024
f865aa6
Turn cohere dataset into chatcohere dataset
merelcht May 29, 2024
2d805d9
Clean up cohere dataset
merelcht May 30, 2024
56ce7ff
Merge branch 'main' into feat/langchain-dataset
merelcht May 30, 2024
4d726b1
Update release notes + init
merelcht May 30, 2024
89c49a1
Apply suggestions from code review
merelcht Jun 3, 2024
0d88147
Add version pins for langchain dependencies
merelcht Jun 3, 2024
8b2d578
Update kedro-datasets/kedro_datasets_experimental/langchain/_anthropi…
merelcht Jun 3, 2024
acec1d5
Try loosen pin on langchain-cohere
merelcht Jun 3, 2024
dac5066
Only pin dependencies of dataset def in pyproject.toml
merelcht Jun 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
# Upcoming Release
## Major features and improvements

* Added the following new **experimental** datasets:

| Type | Description | Location |
|-------------------------------------|-----------------------------------------------------------|-----------------------------------------|
| `langchain.ChatAnthropicDataset` | A dataset for loading a ChatAnthropic langchain model. | `kedro_datasets_experimental.langchain` |
| `langchain.ChatCohereDataset` | A dataset for loading a ChatCohere langchain model. | `kedro_datasets_experimental.langchain` |
| `langchain.OpenAIEmbeddingsDataset` | A dataset for loading a OpenAIEmbeddings langchain model. | `kedro_datasets_experimental.langchain` |
| `langchain.ChatOpenAIDataset` | A dataset for loading a ChatOpenAI langchain model. | `kedro_datasets_experimental.langchain` |


# Release 3.0.1

## Bug fixes and other changes
Expand Down
19 changes: 19 additions & 0 deletions kedro-datasets/kedro_datasets_experimental/langchain/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Provides interface to langchain model API objects."""
from typing import Any

import lazy_loader as lazy

# https://github.com/pylint-dev/pylint/issues/4300#issuecomment-1043601901
ChatOpenAIDataset: Any
OpenAIEmbeddingsDataset: Any
ChatAnthropicDataset: Any
ChatCohereDataset: Any

__getattr__, __dir__, __all__ = lazy.attach(
__name__,
submod_attrs={
"_openai": ["ChatOpenAIDataset", "OpenAIEmbeddingsDataset"],
"_anthropic": ["ChatAnthropicDataset"],
"_cohere": ["ChatCohereDataset"],
},
)
75 changes: 75 additions & 0 deletions kedro-datasets/kedro_datasets_experimental/langchain/_anthropic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""Defines an interface to common Anthropic models."""

from typing import Any, NoReturn

from kedro.io import AbstractDataset, DatasetError
from langchain_anthropic import ChatAnthropic


class ChatAnthropicDataset(AbstractDataset[None, ChatAnthropic]):
"""``ChatAnthropicDataset`` loads a ChatAnthropic `langchain <https://python.langchain.com/>`_ model.

Example usage for the :doc:`YAML API <kedro:data/data_catalog_yaml_examples>`:

catalog.yml:

.. code-block:: yaml
claude_instant_1:
type: langchain.ChatAnthropicDataset
kwargs:
model: "claude-instant-1"
temperature: 0.0
credentials: anthropic


credentials.yml:

.. code-block:: yaml
anthropic:
anthropic_api_url: <anthropic-api-base>
anthropic_api_key: <anthropic-api-key>

Example usage for the
`Python API <https://kedro.readthedocs.io/en/stable/data/\
advanced_data_catalog_usage.html>`_:

.. code-block:: python
>>> from kedro_datasets_experimental.langchain import ChatAnthropicDataset
>>> llm = ChatAnthropicDataset(
... credentials={
... "anthropic_api_url": "xxx"
... "anthropic_api_key": "xxx",
... },
... kwargs={
... "model": "claude-instant-1",
... "temperature": 0.0,
... }
... ).load()
>>>
>>> # See: https://python.langchain.com/docs/integrations/chat/anthropic
>>> llm.invoke("Hello world!")
"""

def __init__(self, credentials: dict[str, str], kwargs: dict[str, Any] = None):
"""Constructor.

Args:
credentials: must contain `anthropic_api_url` and `anthropic_api_key`.
kwargs: keyword arguments passed to the ChatAnthropic constructor.
"""
self.anthropic_api_url = credentials["anthropic_api_url"]
self.anthropic_api_key = credentials["anthropic_api_key"]
self.kwargs = kwargs or {}

def _describe(self) -> dict[str, Any]:
return {**self.kwargs}

def _save(self, data: None) -> NoReturn:
raise DatasetError(f"{self.__class__.__name__} is a read only data set type")

def _load(self) -> ChatAnthropic:
return ChatAnthropic(
anthropic_api_url=self.anthropic_api_url,
anthropic_api_key=self.anthropic_api_key,
**self.kwargs,
)
73 changes: 73 additions & 0 deletions kedro-datasets/kedro_datasets_experimental/langchain/_cohere.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
"""
Cohere dataset definition.
"""

from typing import Any, NoReturn

from kedro.io import AbstractDataset, DatasetError
from langchain_cohere import ChatCohere
merelcht marked this conversation as resolved.
Show resolved Hide resolved


class ChatCohereDataset(AbstractDataset[None, ChatCohere]):
"""``ChatCohereDataset`` loads a ChatCohere `langchain <https://python.langchain.com/>`_ model.

Example usage for the :doc:`YAML API <kedro:data/data_catalog_yaml_examples>`:

catalog.yml:

.. code-block:: yaml
command:
type: langchain.ChatCohereDataset
kwargs:
model: "command"
temperature: 0.0
credentials: cohere


credentials.yml:

.. code-block:: yaml
cohere:
cohere_api_url: <cohere-api-base>
cohere_api_key: <cohere-api-key>

Example usage for the
`Python API <https://kedro.readthedocs.io/en/stable/data/\
advanced_data_catalog_usage.html>`_:

.. code-block:: python
>>> from kedro_datasets_experimental.langchain import ChatCohereDataset
>>> llm = ChatCohereDataset(
... credentials={
... "cohere_api_key": "xxx",
... "cohere_api_url": "xxx",
... },
... kwargs={
... "model": "command",
... "temperature": 0,
... }
... ).load()
>>>
>>> # See: https://python.langchain.com/v0.1/docs/integrations/chat/cohere/
>>> llm.invoke("Hello world!")
"""

def __init__(self, credentials: dict[str, str], kwargs: dict[str, Any] = None):
"""Constructor.

Args:
credentials: must contain `cohere_api_url` and `cohere_api_key`.
kwargs: keyword arguments passed to the underlying constructor.
"""
self.cohere_api_url = credentials["cohere_api_url"]
self.cohere_api_key = credentials["cohere_api_key"]
self.kwargs = kwargs or {}

def _describe(self) -> dict[str, Any]:
return {**self.kwargs}

def _save(self, data: None) -> NoReturn:
raise DatasetError(f"{self.__class__.__name__} is a read only data set type")

def _load(self) -> ChatCohere:
return ChatCohere(cohere_api_key=self.cohere_api_key, base_url=self.cohere_api_url, **self.kwargs)
139 changes: 139 additions & 0 deletions kedro-datasets/kedro_datasets_experimental/langchain/_openai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
"""Defines an interface to common OpenAI models."""

from abc import abstractmethod
from typing import Any, Generic, NoReturn, TypeVar

from kedro.io import AbstractDataset, DatasetError
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

OPENAI_TYPE = TypeVar("OPENAI_TYPE")


class OpenAIDataset(AbstractDataset[None, OPENAI_TYPE], Generic[OPENAI_TYPE]):
"""OpenAI dataset used to access credentials at runtime."""

@property
@abstractmethod
def constructor(self) -> OPENAI_TYPE:
"""Return the OpenAI class to construct in the _load method."""

def __init__(self, credentials: dict[str, str], kwargs: dict[str, Any] = None):
"""Constructor.

Args:
credentials: must contain `openai_api_base` and `openai_api_key`.
kwargs: keyword arguments passed to the underlying constructor.
"""
self.openai_api_base = credentials["openai_api_base"]
self.openai_api_key = credentials["openai_api_key"]
self.kwargs = kwargs or {}

def _describe(self) -> dict[str, Any]:
return {**self.kwargs}

def _save(self, data: None) -> NoReturn:
raise DatasetError(f"{self.__class__.__name__} is a read only data set type")

def _load(self) -> OPENAI_TYPE:
return self.constructor(
openai_api_base=self.openai_api_base,
openai_api_key=self.openai_api_key,
**self.kwargs,
)


class OpenAIEmbeddingsDataset(OpenAIDataset[OpenAIEmbeddings]):
"""``OpenAIEmbeddingsDataset`` loads a OpenAIEmbeddings `langchain <https://python.langchain.com/>`_ model.

Example usage for the :doc:`YAML API <kedro:data/data_catalog_yaml_examples>`:

catalog.yml:

.. code-block:: yaml
text_embedding_ada_002:
type: langchain.OpenAIEmbeddingsDataset
kwargs:
model: "text-embedding-ada-002"
credentials: openai

credentials.yml:

.. code-block:: yaml
openai:
openai_api_base: <openai-api-base>
openai_api_key: <openai-api-key>

Example usage for the
`Python API <https://kedro.readthedocs.io/en/stable/data/\
advanced_data_catalog_usage.html>`_:

.. code-block:: python
>>> from kedro_datasets_experimental.langchain import OpenAIEmbeddingsDataset
>>>
>>> embeddings = OpenAIEmbeddingsDataset(
... credentials={
... "openai_api_base": "<openai-api-base>",
... "openai_api_key": "<openai-api-key>",
... },
... kwargs={
... "model": "text-embedding-ada-002",
... },
... ).load()
>>>
>>> # See: https://python.langchain.com/docs/integrations/text_embedding/openai
>>> embeddings.embed_query("Hello world!")

"""

@property
def constructor(self) -> type[OpenAIEmbeddings]:
return OpenAIEmbeddings


class ChatOpenAIDataset(OpenAIDataset[ChatOpenAI]):
"""``ChatOpenAIDataset`` loads a ChatOpenAI `langchain <https://python.langchain.com/>`_ model.

Example usage for the :doc:`YAML API <kedro:data/data_catalog_yaml_examples>`:

catalog.yml:

.. code-block:: yaml
gpt_3_5_turbo:
type: langchain.ChatOpenAIDataset
kwargs:
model: "gpt-3.5-turbo"
temperature: 0.0
credentials: openai

credentials.yml:

.. code-block:: yaml
openai:
openai_api_base: <openai-api-base>
openai_api_key: <openai-api-key>

Example usage for the
`Python API <https://kedro.readthedocs.io/en/stable/data/\
advanced_data_catalog_usage.html>`_:

.. code-block:: python
>>> from kedro_datasets_experimental.langchain import ChatOpenAIDataset
>>>
>>> llm = ChatOpenAIDataset(
... credentials={
... "openai_api_base": "<openai-api-base>",
... "openai_api_key": "<openai-api-key>",
... },
... kwargs={
... "model": "gpt-3.5-turbo",
... "temperature": 0,
... },
... ).load()
>>>
>>> # See: https://python.langchain.com/docs/integrations/chat/openai
>>> llm.invoke("Hello world!")
"""

@property
def constructor(self) -> type[ChatOpenAI]:
return ChatOpenAI
13 changes: 11 additions & 2 deletions kedro-datasets/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,11 @@ yaml-yamldataset = ["kedro-datasets[pandas-base]", "PyYAML>=4.2, <7.0"]
yaml = ["kedro-datasets[yaml-yamldataset]"]

# Experimental Datasets

langchain-chatopenaidataset = ["langchain-openai~=0.1.7"]
langchain-openaiembeddingsdataset = ["langchain-openai~=0.1.7"]
langchain-chatanthropicdataset = ["langchain-anthropic~=0.1.13", "langchain-community~=0.2.0"]
langchain-chatcoheredataset = ["langchain-cohere~=0.1.5", "langchain-community~=0.2.0"]
langchain = ["kedro-datasets[langchain-chatopenaidataset,langchain-openaiembeddingsdataset,langchain-chatanthropicdataset,langchain-chatcoheredataset ]"]

# Docs requirements
docs = [
Expand Down Expand Up @@ -261,7 +265,12 @@ test = [
]

# Experimental dataset requirements
experimental = []
experimental = [
"langchain-openai",
"langchain-cohere",
"langchain-anthropic",
"langchain-community",
]

# All requirements
all = ["kedro-datasets[test,docs]"]
Expand Down