Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new LLM async support to get models other than just Anthropic #13

Open
simonw opened this issue Nov 14, 2024 · 3 comments
Open

Use new LLM async support to get models other than just Anthropic #13

simonw opened this issue Nov 14, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Contributor

simonw commented Nov 14, 2024

This plugin currently hard-codes to using Claude 3 Haiku and the Anthropic client library:

model="claude-3-haiku-20240307",

client = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
sql = await generate_sql_with_retries(client, db, question, schema)

Now that LLM has async support I want to switch to that in order to get multiple models supported at once:

@simonw simonw added the enhancement New feature or request label Nov 14, 2024
@simonw
Copy link
Contributor Author

simonw commented Nov 14, 2024

In working on this I found and fixed this issue:

@simonw
Copy link
Contributor Author

simonw commented Nov 14, 2024

The plugin currently uses Anthropic's prefix= mechanism and builds up a set of examples in messages JSON like this:

messages = [
{"role": "user", "content": "The table schema is:\n" + schema},
{"role": "assistant", "content": "Ask questions to generate SQL"},
{"role": "user", "content": "How many rows in the sqlite_master table?"},
{
"role": "assistant",
"content": "select count(*) from sqlite_master\n-- Count rows in the sqlite_master table",
},
{"role": "user", "content": question},
]

LLM doesn't yet have a neat mechanism for building a fake conversation like that and other models don't support prefix= (also missing from LLM at the moment) so I'll need to switch to a different approach.

I'm going to tell the model to output this and then extract the SQL with a regex:

```sql
select ...
```

@simonw
Copy link
Contributor Author

simonw commented Nov 14, 2024

The new default model is gpt-4o-mini which benefits enormously from automatic prompt caching - subsequent calls get way cheaper because the schema is previously cached.

e.g. just saw this in debug logs:

{'completion_tokens': 49, 'prompt_tokens': 3558, 'total_tokens': 3607, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 3328}}

And honestly even without calculating the 50% discount on those cached tokens that's wildly inexpensive:

Total cost: 0.0528 cents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant