-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to execute prompts against an asyncio (non-blocking) API #507
Comments
Two options:
I'm going to build 1. I will treat 2. as an optional nice-to-have in the future. I expect that any of the API-driven plugins which do not use a client library (like |
Two options for the API design:
I'm torn on these two. Not sure how to decide. |
I used Claude as an electric bicycle for the mind and I've decided to go with the separate async class. I think having a separate class will make things like type hints more obvious. https://gist.github.com/simonw/f69e6a1fe21df5007ce038dfe91c62f4 |
I'm going to add a |
I'm going to have to do this for embedding models, too. |
This may be tricky: Lines 361 to 363 in fe1e097
That's a reminder that sometimes we do One option: instead of Reminder though: the reason we have |
From https://community.openai.com/t/asynchronous-use-of-the-library/479414/4 it looks like OpenAI used to have a |
OpenAI async docs are here: https://github.com/openai/openai-python?tab=readme-ov-file#async-usageimport import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def main() -> None:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
model="gpt-3.5-turbo",
) |
I got the first prototype of this working (minus logging to the database):
The
|
Claude helped with the refactoring of |
Before I commit further to this API, let's see how this feels from Python library code. |
This is interesting: >>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> model.prompt("say hi in spanish")
/Users/simon/Dropbox/Development/llm/llm/models.py:390: RuntimeWarning: coroutine 'AsyncResponse.text' was never awaited
return "<Response prompt='{}' text='{}'>".format(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
<Response prompt='say hi in spanish' text='<coroutine object AsyncResponse.text at 0x3133d84a0>'> That's because the default |
Using >>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> response = model.prompt("say hi in spanish")
>>> response
<Response prompt='say hi in spanish' text='... not yet awaited ...'>
>>> await response.text()
'¡Hola!'
>>> response
<Response prompt='say hi in spanish' text='¡Hola!'> |
This output tokens one at a time: >>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> async for token in model.prompt("describe a good dog in french"):
... print(token, end="", flush=True)
...
Un bon chien est fidèle, affectueux et protecteur. Il aime passer du temps avec sa famille et est toujours prêt à jouer. Sa loyauté est sans égale, et il fait tout pour garder ses maîtres heureux. Un bon chien est aussi intelligent et facile à dresser, ce qui lui permet d'apprendre rapidement des commandes et des tours. Sa présence apporte réconfort et joie, et il sait être un excellent compagnon, toujours là dans les moments de bonheur comme dans les moments difficiles. En somme, un bon chien est un véritable ami pour la vie. |
This async API feels pretty good to me. I think this may be the right design. Still to figure out:
|
Reminder: model registration currently looks like this: llm/llm/default_plugins/openai_models.py Lines 24 to 45 in fe1e097
If I add a new separate Given that, I think I want the existing So what could that look like? One option: def register_models(register):
register(Chat("gpt-3.5-turbo"), AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt")) So Problem there is that I may eventually have models which are available only in their async variant. So I could do this: def register_models(register):
register(model=Chat("gpt-3.5-turbo"), async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt")) That would free me up to support this: def register_models(register):
register(async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt")) That's not terrible, it may be the way to go. |
I think the |
The only place that calls that plugin hook (and hence the only place that defines the Lines 66 to 85 in fe1e097
|
Got this working:
|
I have a hunch the saving-to and loading-from the database bit may end up producing new plugin hooks. I'll need those for Datasette at any rate since it has its own mechanism for executing database writes. Although I may not need plugin hooks for that if I stick to keeping the database persistence code in the CLI layer. But... I already have a need for plugins like So it may be that it's not so much a plugin as a documented interface for blocking database logging (for use in other blocking plugins) and a documented "roll your own" policy for async database logging. Which may grow into plugin hooks for database persistence that can be both sync and async friendly at a later date. |
I'm doing this in the prototype right now and I really don't like it: llm/llm/default_plugins/openai_models.py Line 444 in 44e6be1
That's multiple inheritance AND I'm inheriting from the non-async model class! I'm going to refactor those to a common base class instead. |
Got Claude to refactor that for me: https://gist.github.com/simonw/3872e758c129917980605eed49cbce7f |
I should do an async Ollama spike: https://github.com/ollama/ollama-python/blob/main/examples/async-chat-stream/main.py Then I can try this against local models like these: |
Refs #25 Refs simonw/llm#507 Refs simonw/llm#613
…els (#613) - #507 (comment) * register_model is now async aware Refs #507 (comment) * Refactor Chat and AsyncChat to use _Shared base class Refs #507 (comment) * fixed function name * Fix for infinite loop * Applied Black * Ran cog * Applied Black * Add Response.from_row() classmethod back again It does not matter that this is a blocking call, since it is a classmethod * Made mypy happy with llm/models.py * mypy fixes for openai_models.py I am unhappy with this, had to duplicate some code. * First test for AsyncModel * Still have not quite got this working * Fix for not loading plugins during tests, refs #626 * audio/wav not audio/wave, refs #603 * Black and mypy and ruff all happy * Refactor to avoid generics * Removed obsolete response() method * Support text = await async_mock_model.prompt("hello") * Initial docs for llm.get_async_model() and await model.prompt() Refs #507 * Initial async model plugin creation docs * duration_ms ANY to pass test * llm models --async option Refs #613 (comment) * Removed obsolete TypeVars * Expanded register_models() docs for async * await model.prompt() now returns AsyncResponse Refs #613 (comment) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Tip about pytest --record-mode once Plus mechanism for setting API key during tests with PYTEST_ANTHROPIC_API_KEY * Async support for Claude models Closes #25 Refs simonw/llm#507 Refs simonw/llm#613 * Depend on llm>=0.18a0, refs #25
Refs #25 Refs simonw/llm#507 Refs simonw/llm#613
Refs #25, #20, #24 Refs simonw/llm#507 Refs simonw/llm#613
Datasette in particular needs a neat way to run different models via an
await ...
async method. Having a way for model plugins to provide this - especially the API based plugins like OpenAI and Claude - would be really helpful and would unblock a bunch of Datasette plugins that could benefit from an LLM abstraction layer.The text was updated successfully, but these errors were encountered: