Use new LLM async support to get models other than just Anthropic #13

simonw · 2024-11-14T23:22:13Z

This plugin currently hard-codes to using Claude 3 Haiku and the Anthropic client library:

datasette-query-assistant/datasette_query_assistant/__init__.py

Line 64 in a777a80

model="claude-3-haiku-20240307",

datasette-query-assistant/datasette_query_assistant/__init__.py

Lines 133 to 135 in a777a80

    
           client = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) 
        
           sql = await generate_sql_with_retries(client, db, question, schema)

Now that LLM has async support I want to switch to that in order to get multiple models supported at once:

Ability to execute prompts against an asyncio (non-blocking) API simonw/llm#507

simonw · 2024-11-14T23:22:27Z

In working on this I found and fixed this issue:

Replying to an async conversation does not work simonw/llm#632

simonw · 2024-11-14T23:24:06Z

The plugin currently uses Anthropic's prefix= mechanism and builds up a set of examples in messages JSON like this:

datasette-query-assistant/datasette_query_assistant/__init__.py

Lines 71 to 80 in a777a80

    
           messages = [ 
        
               {"role": "user", "content": "The table schema is:\n" + schema}, 
        
               {"role": "assistant", "content": "Ask questions to generate SQL"}, 
        
               {"role": "user", "content": "How many rows in the sqlite_master table?"}, 
        
               { 
        
                   "role": "assistant", 
        
                   "content": "select count(*) from sqlite_master\n-- Count rows in the sqlite_master table", 
        
               }, 
        
               {"role": "user", "content": question}, 
        
           ]

LLM doesn't yet have a neat mechanism for building a fake conversation like that and other models don't support prefix= (also missing from LLM at the moment) so I'll need to switch to a different approach.

I'm going to tell the model to output this and then extract the SQL with a regex:

```sql
select ...
```

simonw · 2024-11-14T23:28:10Z

The new default model is gpt-4o-mini which benefits enormously from automatic prompt caching - subsequent calls get way cheaper because the schema is previously cached.

e.g. just saw this in debug logs:

{'completion_tokens': 49, 'prompt_tokens': 3558, 'total_tokens': 3607, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 3328}}

And honestly even without calculating the 50% discount on those cached tokens that's wildly inexpensive:

simonw added the enhancement New feature or request label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new LLM async support to get models other than just Anthropic #13

Use new LLM async support to get models other than just Anthropic #13

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024 •

edited

Loading

Use new LLM async support to get models other than just Anthropic #13

Use new LLM async support to get models other than just Anthropic #13

Comments

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024

simonw commented Nov 14, 2024 • edited Loading

simonw commented Nov 14, 2024 •

edited

Loading