Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP LLM Integration #406

Closed
wants to merge 129 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
f15be68
Started working on llm_responses
NotBioWaste905 Jul 19, 2024
56b7789
Created class, created 1st tutorial
NotBioWaste Jul 22, 2024
af60115
Added dependecies for langchain
NotBioWaste Jul 22, 2024
b3b79a5
Fixed adding custom prompt for each node
NotBioWaste Jul 22, 2024
6eb910d
Added image processing, updated tutorial
NotBioWaste Jul 22, 2024
1f8cddc
Added typehint
NotBioWaste Jul 22, 2024
74cd954
Added llm_response, LLM_API, history management
NotBioWaste Jul 22, 2024
1fd31a2
Fixed image reading
NotBioWaste Jul 22, 2024
2c48490
Started llm condition
NotBioWaste Jul 24, 2024
a1884e5
Added message_to_langchain
NotBioWaste Jul 24, 2024
61f302e
Implementing deepeval integration
NotBioWaste Jul 29, 2024
38a8f8f
Figured out how to implement DeepEval functions
NotBioWaste905 Jul 30, 2024
592267f
Adding conditions
NotBioWaste Jul 31, 2024
baccc47
Implemented simple conditions call, added BaseMethod class, renaming,…
NotBioWaste Aug 1, 2024
8e84ba1
Fixed history extraction
NotBioWaste Aug 2, 2024
2b2847b
Delete test_bot.py
NotBioWaste905 Aug 2, 2024
7e336ac
Fixed prompt handling, switched to AIMessage in LLM response
NotBioWaste Aug 5, 2024
71babbf
Merge branch 'feat/llm_responses' of https://github.com/deeppavlov/di…
NotBioWaste Aug 5, 2024
351ae06
Fixed conditions call
NotBioWaste Aug 5, 2024
e3d0d15
Working on autotesting
NotBioWaste Aug 5, 2024
0405998
Added tests
NotBioWaste Aug 7, 2024
3dbfd0c
Removed unused method
NotBioWaste Aug 7, 2024
5c876ba
Added annotations
NotBioWaste Aug 7, 2024
8f1932c
Added structured output support, tweaked tests
NotBioWaste Aug 7, 2024
aedf47e
Reworking tutorials
NotBioWaste Aug 7, 2024
adadb05
Reworked prompt usage and hierarchy, reworked filters and methods
NotBioWaste Aug 12, 2024
0288896
No idea how to make script smaller in tutorials
NotBioWaste Aug 12, 2024
67e2758
Small fixes in tutorials and structured generation
NotBioWaste Aug 13, 2024
428a9f0
Working on user guide
NotBioWaste Aug 14, 2024
5e26b4b
Fixed some tutorials, finished user guide
NotBioWaste Aug 14, 2024
5dbb6cd
Bugfixes in docs
NotBioWaste Aug 14, 2024
db63d1a
Lint
NotBioWaste Aug 14, 2024
2b9080f
Removed type annotation that broke docs building
NotBioWaste Aug 14, 2024
2bcda71
Tests and bugfixes
NotBioWaste Aug 15, 2024
d2f28ed
Deleted DeepEval references
NotBioWaste Aug 15, 2024
7318c91
Numpy versions trouble
NotBioWaste Aug 15, 2024
27eae27
Fixed dependecies
NotBioWaste Aug 16, 2024
3fed1fc
Made everything asynchronous
NotBioWaste Aug 16, 2024
30862ca
Added and unified docstring
NotBioWaste Aug 16, 2024
06ab5bc
Added 4th tutorial, fixed message_schema parameter passing
NotBioWaste Aug 16, 2024
798a77b
Bugfix, added max_size to the message_to_langchain function
NotBioWaste Aug 20, 2024
3343159
Made even more everything asynchronous
NotBioWaste Aug 21, 2024
014ff7e
Remade condition, added logprob check
NotBioWaste Aug 21, 2024
761bd81
Async bugfix, added model_result_to_text, working on message_schema f…
NotBioWaste Aug 22, 2024
90a811e
Minor fixes, tinkering tests
NotBioWaste Aug 23, 2024
5bff191
Merge branch 'refs/heads/dev' into feat/llm_responses
RLKRo Aug 23, 2024
8b88ba6
update lock file
RLKRo Aug 23, 2024
20c4afd
Merge remote-tracking branch 'origin/feat/llm_responses' into feat/ll…
RLKRo Aug 23, 2024
0139421
Merge remote-tracking branch 'origin/master' into feat/llm_responses
NotBioWaste905 Sep 18, 2024
9bb0cba
Updating to v1.0
NotBioWaste905 Sep 23, 2024
f2d6b68
Finished tests, finished update
NotBioWaste905 Sep 26, 2024
6fddaea
lint
NotBioWaste905 Sep 26, 2024
e06bc2b
Started working on llm slots
NotBioWaste905 Sep 26, 2024
22d8efc
Resolving pydantic errors
NotBioWaste905 Sep 27, 2024
aa735b5
Delete llmslot_test.py
NotBioWaste905 Sep 27, 2024
cc91133
Finished LLMSlot, working on LLMGroupSlot
NotBioWaste905 Sep 27, 2024
8756838
Merge remote-tracking branch 'origin/feat/llm_responses' into feat/ll…
NotBioWaste905 Sep 27, 2024
f1857f6
Added flag to
NotBioWaste905 Oct 1, 2024
c334ff5
First test attempts
NotBioWaste905 Oct 1, 2024
8306bbb
linting
NotBioWaste905 Oct 1, 2024
f842776
Merge branch 'feat/slots_extraction_update' into feat/llm_responses
NotBioWaste905 Oct 1, 2024
ada17ca
Merge remote-tracking branch 'origin/feat/llm_responses' into feat/ll…
NotBioWaste905 Oct 1, 2024
a45f653
File structure fixed
NotBioWaste905 Oct 3, 2024
3838d30
Fixed naming
NotBioWaste905 Oct 3, 2024
0e650f8
Create LLMCondition and LLMResponse classes
NotBioWaste905 Oct 3, 2024
015cb4f
Debugging flattening
NotBioWaste905 Oct 23, 2024
b6e5eeb
Bugfix
NotBioWaste905 Oct 23, 2024
b20137e
Added return_type property for LLMSlot
NotBioWaste905 Oct 23, 2024
25f5b04
Changed return_type from Any to type
NotBioWaste905 Oct 23, 2024
b651087
lint
NotBioWaste905 Oct 23, 2024
1b5a77b
removed deprecated from_script from tutorials
NotBioWaste905 Nov 2, 2024
c18d375
Fixed LLMCondition class
NotBioWaste905 Nov 2, 2024
459f7fc
Fixed missing 'models' field in Pipeline, updated tutorials
NotBioWaste905 Nov 6, 2024
24300e8
create __get_llm_response method in LLM_API, refactoring LLM Conditio…
NotBioWaste905 Nov 7, 2024
03b02be
Merge branch 'refs/heads/dev' into feat/llm_responses
RLKRo Nov 7, 2024
e6663b3
update lock file
RLKRo Nov 7, 2024
2e1c190
remove outdated entries from conf.py
RLKRo Nov 7, 2024
859c57a
small fixes to user guide
RLKRo Nov 7, 2024
fb3142b
minor tutorial changes
RLKRo Nov 7, 2024
ff81267
Moved docstring, removed pipeline parameter
NotBioWaste905 Nov 13, 2024
7518259
Fixed type annotation for models field in Pipeline
NotBioWaste905 Nov 13, 2024
ac28d78
removed unused imports from llm/__init__.py
NotBioWaste905 Nov 13, 2024
2d4998c
Fix redundancy in chatsky/slots/llm.py
NotBioWaste905 Nov 13, 2024
23d6a31
Fixed circular LLM_API<=>Pipeline import
NotBioWaste905 Nov 13, 2024
ef9baa3
Merge remote-tracking branch 'origin/feat/llm_responses' into feat/ll…
NotBioWaste905 Nov 13, 2024
4bf5bba
Update import order chatsky/llm/filters.py
NotBioWaste905 Nov 13, 2024
9188b89
Fixes in filters
NotBioWaste905 Nov 14, 2024
02894f0
Fixes of LLM_API annotations and docs
NotBioWaste905 Nov 14, 2024
8e839a1
Removed __get_llm_response, lint
NotBioWaste905 Nov 14, 2024
210b10a
Added context_to_history util, some tweaks in responses
NotBioWaste905 Nov 14, 2024
784f323
remove llm_response object initialization from tutorials
RLKRo Nov 14, 2024
042d256
fix imports in __init__ files:
RLKRo Nov 14, 2024
10533ed
fix: rename llm_response to LLMResponse, rename llm_condition to LLMC…
RLKRo Nov 14, 2024
8f21069
fix codeblocks in user guide
RLKRo Nov 14, 2024
95e2418
fix: message_to_langchain accepts context instead of pipeline
RLKRo Nov 15, 2024
934a0b8
remove defaults from filter definitions
RLKRo Nov 15, 2024
1be58a0
check field not none in filters
RLKRo Nov 15, 2024
4d68a29
remove model_name from LLM_API.respond
RLKRo Nov 15, 2024
fa0ae70
make LLMResponse prompt AnyResponse, remove __prompt_to_message
RLKRo Nov 15, 2024
8778637
fix return style in LLM_API.respond
RLKRo Nov 15, 2024
d4b67a1
fix LLM_API.condition signature
RLKRo Nov 15, 2024
4a29687
some doc fixes
RLKRo Nov 15, 2024
37aafb3
fix message schema json dumping
RLKRo Nov 15, 2024
54a7376
remove unused imports
RLKRo Nov 15, 2024
86da03e
fix circular import
RLKRo Nov 15, 2024
eac43e0
fix tests
RLKRo Nov 15, 2024
51c66a8
remove cnd.true()
RLKRo Nov 15, 2024
33242ca
Fixed empty prompt popping up
NotBioWaste905 Nov 15, 2024
65f7c8f
Format
NotBioWaste905 Nov 15, 2024
dc92132
Switched model from 3.5-turbo to 4o-mini
NotBioWaste905 Nov 15, 2024
020a7ef
Updated all of the models
NotBioWaste905 Nov 15, 2024
c9891f6
Fixes and logging
NotBioWaste905 Nov 15, 2024
c678f89
Codestyle
NotBioWaste905 Nov 15, 2024
f2df441
update lock file
RLKRo Nov 15, 2024
f20d463
simplify history text
RLKRo Nov 15, 2024
44e5571
fix codestyle
RLKRo Nov 15, 2024
9f97ce2
fix doc building
RLKRo Nov 15, 2024
b9e738a
Merge branch 'refs/heads/dev' into feat/llm_responses
RLKRo Nov 15, 2024
39750ba
update lock file
RLKRo Nov 15, 2024
5f4b07b
mvp: cut attachment processing
RLKRo Nov 15, 2024
fb53d63
mvp: cut last two tutorials
RLKRo Nov 15, 2024
6603f7d
remove unnecessary langchain extras
RLKRo Nov 15, 2024
17f4f8e
Merge branch 'refs/heads/feat/llm_responses' into mvp_llm
RLKRo Nov 15, 2024
3827462
update lock file
RLKRo Nov 15, 2024
79c0c9e
Merge branch 'refs/heads/feat/llm_responses' into mvp_llm
RLKRo Nov 15, 2024
f7e7684
protect langchain imports & sort imports in modules
RLKRo Nov 15, 2024
fc9b900
Merge branch 'refs/heads/feat/llm_responses' into mvp_llm
RLKRo Nov 15, 2024
a4e0462
skip llm tests on missing langchain
RLKRo Nov 15, 2024
cb09c56
Merge branch 'refs/heads/feat/llm_responses' into mvp_llm
RLKRo Nov 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions chatsky/conditions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
)
from chatsky.conditions.slots import SlotsExtracted
from chatsky.conditions.service import ServiceFinished
from chatsky.conditions.llm import LLMCondition
32 changes: 32 additions & 0 deletions chatsky/conditions/llm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""
LLM Conditions
--------------
This module provides LLM-based conditions.
"""

from chatsky.llm.methods import BaseMethod
from chatsky.core import BaseCondition, Context


class LLMCondition(BaseCondition):
"""
LLM-based condition.
Uses prompt to produce result from model and evaluates the result using given method.
"""

model_name: str
"""
Key of the model in the :py:attr:`~chatsky.core.pipeline.Pipeline.models` dictionary.
"""
prompt: str
"""
Condition prompt.
"""
method: BaseMethod
"""
Method that takes model's output and returns boolean.
"""

async def call(self, ctx: Context) -> bool:
model = ctx.pipeline.models[self.model_name]
return await model.condition(ctx, self.prompt, self.method)
7 changes: 7 additions & 0 deletions chatsky/core/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from chatsky.core.service.actor import Actor
from chatsky.core.node_label import AbsoluteNodeLabel, AbsoluteNodeLabelInitTypes
from chatsky.core.script_parsing import JSONImporter, Path
from chatsky.llm.llm_api import LLM_API

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -78,6 +79,10 @@ class Pipeline(BaseModel, extra="forbid", arbitrary_types_allowed=True):
"""
Slots configuration.
"""
models: Dict[str, LLM_API] = Field(default_factory=dict)
"""
LLM models.
"""
messenger_interface: MessengerInterface = Field(default_factory=CLIMessengerInterface)
"""
A `MessengerInterface` instance for this pipeline.
Expand Down Expand Up @@ -116,6 +121,7 @@ def __init__(
*,
default_priority: float = None,
slots: GroupSlot = None,
models: dict = None,
messenger_interface: MessengerInterface = None,
context_storage: Union[DBContextStorage, dict] = None,
pre_services: ServiceGroupInitTypes = None,
Expand All @@ -133,6 +139,7 @@ def __init__(
"fallback_label": fallback_label,
"default_priority": default_priority,
"slots": slots,
"models": models,
"messenger_interface": messenger_interface,
"context_storage": context_storage,
"pre_services": pre_services,
Expand Down
3 changes: 3 additions & 0 deletions chatsky/llm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from chatsky.llm.filters import BaseFilter, FromTheModel, IsImportant
from chatsky.llm.methods import BaseMethod, LogProb, Contains
from chatsky.llm.llm_api import LLM_API
25 changes: 25 additions & 0 deletions chatsky/llm/_langchain_imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from typing import Any

try:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.messages.base import BaseMessage
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_core.outputs.llm_result import LLMResult

langchain_available = True
except ImportError:
StrOutputParser = Any
BaseChatModel = Any
BaseMessage = Any
HumanMessage = Any
SystemMessage = Any
AIMessage = Any
LLMResult = Any

langchain_available = False


def check_langchain_available():
if not langchain_available:
raise ImportError("Langchain is not available. Please install it with `pip install chatsky[llm]`.")
62 changes: 62 additions & 0 deletions chatsky/llm/filters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""
Filters
---------
This module contains a collection of basic functions for history filtering to avoid cluttering LLMs context window.
"""

import abc

from pydantic import BaseModel

from chatsky.core.message import Message
from chatsky.core.context import Context


class BaseFilter(BaseModel, abc.ABC):
"""
Base class for all message history filters.
"""

@abc.abstractmethod
def __call__(self, ctx: Context, request: Message, response: Message, model_name: str) -> bool:
"""
:param ctx: Context object.
:param request: Request message.
:param response: Response message.
:param model_name: Name of the model in the Pipeline.models.
"""
raise NotImplementedError


class IsImportant(BaseFilter):
"""
Filter that checks if the "important" field in a Message.misc is True.
"""

def __call__(self, ctx: Context, request: Message, response: Message, model_name: str) -> bool:
if request is not None and request.misc is not None and request.misc.get("important", None):
return True
if response is not None and response.misc is not None and response.misc.get("important", None):
return True
return False


class FromTheModel(BaseFilter):
"""
Filter that checks if the message was sent by the model.
"""

def __call__(self, ctx: Context, request: Message, response: Message, model_name: str) -> bool:
if (
request is not None
and request.annotations is not None
and request.annotations.get("__generated_by_model__") == model_name
):
return True
elif (
response is not None
and response.annotations is not None
and response.annotations.get("__generated_by_model__") == model_name
):
return True
return False
66 changes: 66 additions & 0 deletions chatsky/llm/llm_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
LLM responses.
--------------
Wrapper around langchain.
"""

from typing import Union, Type, Optional
from pydantic import BaseModel

from chatsky.core.message import Message
from chatsky.core.context import Context
from chatsky.llm.methods import BaseMethod
from chatsky.llm.utils import message_to_langchain
from chatsky.llm._langchain_imports import StrOutputParser, BaseChatModel, BaseMessage, check_langchain_available


class LLM_API:
"""
This class acts as a wrapper for all LLMs from langchain
and handles message exchange between remote model and chatsky classes.
"""

def __init__(
self,
model: BaseChatModel,
system_prompt: Optional[str] = "",
) -> None:
"""
:param model: Model object.
:param system_prompt: System prompt for the model.
"""
check_langchain_available()
self.model: BaseChatModel = model
self.parser = StrOutputParser()
self.system_prompt = system_prompt

async def respond(
self,
history: list[BaseMessage],
message_schema: Union[None, Type[Message], Type[BaseModel]] = None,
) -> Message:

if message_schema is None:
result = await self.parser.ainvoke(await self.model.ainvoke(history))
return Message(text=result)
elif issubclass(message_schema, Message):
# Case if the message_schema describes Message structure
structured_model = self.model.with_structured_output(message_schema)
return Message.model_validate(await structured_model.ainvoke(history))
elif issubclass(message_schema, BaseModel):
# Case if the message_schema describes Message.text structure
structured_model = self.model.with_structured_output(message_schema)
model_result = await structured_model.ainvoke(history)
return Message(text=message_schema.model_validate(model_result).model_dump_json())
else:
raise ValueError

async def condition(
self, ctx: Context, prompt: str, method: BaseMethod, return_schema: Optional[BaseModel] = None
) -> bool:
condition_history = [
await message_to_langchain(Message(prompt), ctx=ctx, source="system"),
await message_to_langchain(ctx.last_request, ctx=ctx, source="human"),
]
result = await method(ctx, await self.model.agenerate([condition_history], logprobs=True, top_logprobs=10))
return result
72 changes: 72 additions & 0 deletions chatsky/llm/methods.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""
LLM methods
-----------
In this file stored unified functions for some basic condition cases
including regex search, semantic distance (cosine) etc.
"""

import abc

from pydantic import BaseModel

from chatsky.core.context import Context
from chatsky.llm._langchain_imports import LLMResult


class BaseMethod(BaseModel, abc.ABC):
"""
Base class to evaluate models response as condition.
"""

@abc.abstractmethod
async def __call__(self, ctx: Context, model_result: LLMResult) -> bool:
raise NotImplementedError

async def model_result_to_text(self, model_result: LLMResult) -> str:
"""
Converts raw model generation to a string.
"""
return model_result.generations[0][0].text


class Contains(BaseMethod):
"""
Simple method to check if a string contains a pattern.

:param str pattern: pattern to check

:return: True if pattern is contained in model result
:rtype: bool
"""

pattern: str

async def __call__(self, ctx: Context, model_result: LLMResult) -> bool:
text = await self.model_result_to_text(model_result)
return bool(self.pattern.lower() in text.lower())


class LogProb(BaseMethod):
"""
Method to check whether a target token's log probability is higher then a threshold.

:param str target_token: token to check (e.g. `"TRUE"`)
:param float threshold: threshold to bypass. by default `-0.5`

:return: True if logprob is higher then threshold
:rtype: bool
"""

target_token: str
threshold: float = -0.5

async def __call__(self, ctx: Context, model_result: LLMResult) -> bool:
try:
result = model_result.generations[0][0].generation_info["logprobs"]["content"][0]["top_logprobs"]
except ValueError:
raise ValueError("LogProb method can only be applied to OpenAI models.")
for tok in result:
if tok["token"] == self.target_token and tok["logprob"] > self.threshold:
return True

return False
59 changes: 59 additions & 0 deletions chatsky/llm/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import logging

from chatsky.core.context import Context
from chatsky.core.message import Message
from chatsky.llm._langchain_imports import HumanMessage, SystemMessage, AIMessage, check_langchain_available


async def message_to_langchain(message: Message, ctx: Context, source: str = "human", max_size: int = 1000):
"""
Creates a langchain message from a ~chatsky.script.core.message.Message object.

:param message: Chatsky Message to convert to Langchain Message.
:param ctx: Context the message belongs to.
:param source: Source of a message [`human`, `ai`, `system`]. Defaults to "human".
:param max_size: Maximum size of the message in symbols.
If exceed the limit will raise ValueError. Is not affected by system prompt size.

:return: Langchain message object.
:rtype: HumanMessage|AIMessage|SystemMessage
"""
check_langchain_available()
if len(message.text) > max_size:
raise ValueError("Message is too long.")

if message.text is None:
message.text = ""
content = [{"type": "text", "text": message.text}]

if source == "human":
return HumanMessage(content=content)
elif source == "ai":
return AIMessage(content=content)
elif source == "system":
return SystemMessage(content=content)
else:
raise ValueError("Invalid source name. Only `human`, `ai` and `system` are supported.")


async def context_to_history(ctx: Context, length: int, filter_func, model_name: str, max_size: int):

history = []

pairs = zip(
[ctx.requests[x] for x in range(1, len(ctx.requests) + 1)],
[ctx.responses[x] for x in range(1, len(ctx.responses) + 1)],
)
logging.debug(f"Dialogue turns: {pairs}")
if length != -1:
for req, resp in filter(lambda x: filter_func(ctx, x[0], x[1], model_name), list(pairs)[-length:]):
logging.debug(f"This pair is valid: {req, resp}")
history.append(await message_to_langchain(req, ctx=ctx, max_size=max_size))
history.append(await message_to_langchain(resp, ctx=ctx, source="ai", max_size=max_size))
else:
# TODO: Fix redundant code
for req, resp in filter(lambda x: filter_func(ctx, x[0], x[1], model_name), list(pairs)):
logging.debug(f"This pair is valid: {req, resp}")
history.append(await message_to_langchain(req, ctx=ctx, max_size=max_size))
history.append(await message_to_langchain(resp, ctx=ctx, source="ai", max_size=max_size))
return history
1 change: 1 addition & 0 deletions chatsky/responses/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .standard import RandomChoice
from .slots import FilledTemplate
from chatsky.responses.llm import LLMResponse
Loading
Loading