Option for passing in longer context fragments, stored in a deduped table #617

simonw · 2024-11-06T21:31:59Z

The attachments feature makes it easy to attach video and audio from a URL or file while ensuring that even if that prompt component is used many times over it only ends up stored once in the database - in the attachments table using a sha256 hash as the ID:

llm/llm/models.py

Lines 28 to 39 in febbc04

    
           def id(self): 
        
               # Hash of the binary content, or of '{"url": "https://..."}' for URL attachments 
        
               if self._id is None: 
        
                   if self.content: 
        
                       self._id = hashlib.sha256(self.content).hexdigest() 
        
                   elif self.path: 
        
                       self._id = hashlib.sha256(open(self.path, "rb").read()).hexdigest() 
        
                   else: 
        
                       self._id = hashlib.sha256( 
        
                           json.dumps({"url": self.url}).encode("utf-8") 
        
                       ).hexdigest() 
        
               return self._id

I find myself wanting the same thing for blocks of text: I'd like to construct e.g. a context that contains the full docs for LLM and run multiple prompts against that without storing many copies of the same exact text in the database.

One possible solution would be to re-use the attachments mechanism, with a new default plain text attachment type that all models support:

llm -m gpt-4o-mini \
  -a https://gist.githubusercontent.com/simonw/f7b251a05b834a9b60edff7e06d31572/raw/3ad2fb03d57aad7b616ec40455c355ec51d2e3db/llm-docs.md \
  'how do I embed a file?'

The text was updated successfully, but these errors were encountered:

simonw · 2024-11-06T21:33:04Z

Pros of this approach:

Reuses existing mechanism, no new tables or columns needed
Neatly solves the need for reused long text prompts

Cons:

A bit confusing: attachments were for multi-modal models, now we have a special type that does something else?
Potentially confusing UI: llm -a URL could point to all kinds of text files, how do we know which ones get pasted into the prompt?
The attachments table uses a BLOB for the content, but we want to store strings. We can encode those as utf-8 in a blob but that might make querying the database directly confusing.
It sucks that you now have to join against attachments just to see what the text version of the prompt was!

simonw · 2024-11-06T21:35:09Z

Alternative approaches:

Do nothing. Store long context prompts many many times in the same table.
Introduce a new non-attachments mechanism for this, so e.g. the responses table can optionally use a foreign key to a new reused_text table which works a bit like attachments do right now.
A mechanism where a response can point to some other response and say I used the same prompt as this one.
ALL prompt strings are moved to a new table with a SHA256 ID column, and all responses have foreign keys to that for their prompt_id and system_id.

That last option is actually reasonably elegant. It feels a bit nasty to me to have a whole bunch of very short values that are stored in a separate table, but it may be that most prompts and system prompts are long enough that this might be worthwhile anyway.

simonw · 2024-11-06T21:35:34Z

Ideally whatever mechanism I use here should work for both system and regular prompts.

If using existing attachments we could invent artificial content types of llm/prompt and llm/system for this.

simonw · 2024-11-06T21:37:09Z

Question: in my current database how many responses have duplicates in prompts or system prompts?

simonw · 2024-11-06T21:42:16Z

Simple dupe detection query:

sqlite-utils "$(llm logs path)" "
select substr(prompt, 0, 20), length(prompt), count(*), length(prompt) * count(*) as size
from responses
group by prompt
having count(*) > 1 order by size desc" -t

substr(prompt, 0, 20)      length(prompt)    count(*)    size
-----------------------  ----------------  ----------  ------
<documents>                        332310           2  664620
<docume
src/util.rs                        288313           2  576626
---
use
<documents>                        112753           4  451012
<docume
jasondavies:                        95647           4  382588
shnkr
882542F3884314B:                   152700           2  305400
c
null                               147256           2  294512
<p>Something t
diff --git a/.githu                111221           2  222442
null                               109634           2  219268
<p>Something t
zerojames:                         102411           2  204822
jayniz:
Start Timestamp;End                 92772           2  185544
#!/usr/bin/env pyth                 54755           3  164265
## v2.7.3                           53962           3  161886

*2023-08

simonw · 2024-11-06T21:43:25Z

I think I need to prototype the approach where every prompt and system string is de-duped into a separate foreign key table, then run it against a copy of my database and compare the sizes.

simonw · 2024-11-06T21:50:22Z

Prototype:

cp "$(llm logs path)" /tmp/logs.db                                                                       
sqlite-utils extract logs.db responses system --table prompt --rename system prompt --fk-column system_id
sqlite-utils extract logs.db responses prompt

simonw · 2024-11-06T21:50:42Z

The resulting DB was even bigger for some reason. A lot of the prompts/system prompts are duplicated in the prompt_json already.

simonw · 2024-11-06T21:52:10Z

Oh here's why it's bigger, it put an index on the new table:

CREATE UNIQUE INDEX [idx_prompt_prompt]
    ON [prompt] ([prompt]);

I really need to do the thing where there's a hash column which has an index on it and is used for lookups, but the foreign key remains an integer so we don't bloat the responses table.

simonw · 2024-11-06T21:53:49Z

Dropping that index and running vacuum dropped the DB size from 170MB to 144MB, but the original is 136MB to we still gained 8MB somehow. Probably just because we needed a bunch of extra pages for the new table.

simonw · 2024-11-06T21:55:09Z

Given these problems with the prompt_id / system_id approach I'm inclined to consider that original -a long-file.txt attachment idea instead.

simonw · 2024-11-06T21:56:00Z

Or I could have optional prompt_id and system_id columns which foreign key against either attachments or some new prompts table - but continue to use the existing prompt and system columns for short prompts.

simonw · 2024-11-06T22:03:19Z

If I insert the prompts as utf-8 binary data into that content column they look like this in Datasette:

Inserting them as text DOES work, but it feels really gnarly to have a BLOB column that sometimes has text in it, plus in the code Attachment.content is typed as a bytes:

llm/llm/models.py

Lines 20 to 26 in febbc04

    
           @dataclass 
        
           class Attachment: 
        
               type: Optional[str] = None 
        
               path: Optional[str] = None 
        
               url: Optional[str] = None 
        
               content: Optional[bytes] = None 
        
               _id: Optional[str] = None

simonw · 2024-11-06T22:05:35Z

Cleanest option here would be to have a content_text column which is a string and use that instead of content for non-binary attachment content.

Google Gemini models actually DO accept plain text attachments already: https://ai.google.dev/gemini-api/docs/document-processing?lang=rest

Gemini 1.5 Pro and 1.5 Flash support a maximum of 3,600 document pages. Document pages must be in one of the following text data MIME types:

PDF - application/pdf

JavaScript - application/x-javascript, text/javascript

Python - application/x-python, text/x-python

TXT - text/plain

HTML - text/html

CSS - text/css

Markdown - text/md

CSV - text/csv

XML - text/xml

RTF - text/rtf

Each document page is equivalent to 258 tokens.

simonw · 2024-11-06T22:09:04Z

So maybe this ticket is actually about supporting plain text attachments and having them work across ALL models, in addition to whatever is going on here with Gemini.

I wonder if any other models out there support attachments like text/md and text/xml and suchlike?

simonw · 2024-11-07T04:56:31Z

I started playing with a llm-docs idea that could benefit from this: https://github.com/simonw/llm-docs

I currently run that like this:

curl -s "https://raw.githubusercontent.com/simonw/llm-docs/refs/heads/main/version-docs/$(llm --version | cut -d' ' -f3).txt" | \
  llm -m gpt-4o-mini 'how do I embed a binary file?'

simonw · 2024-11-08T02:29:26Z

Here's a much simpler idea. I introduce two new llm prompt arguments:

llm -f prompt.txt - runs the prompt from that file (alias --file)
llm -u https://... - runs the prompt fetched from that URL (alias --url)

And maybe two more that do the same for system prompts:

llm --sf prompt.txt - alias --system-file
llm --su https://... - alias --system-url

In addition, I treat cat prompt.txt | llm as working the same as llm -f -

In all cases the text is anticipated to be longer than a regular prompt, and hence should be de-duplicated and stored in a separate table.

That table might be called prompts or inputs or maybe even contexts - it will then have two nullable foreign keys from responses for prompt_id and system_id.

This avoids the confusion that would come from reusing the existing attachments mechanism, and the UI for it feels pretty natural to me - especially since -f and -u are not yet used for anything else.

simonw · 2024-11-08T02:30:59Z

Current database tables are:

conversations
responses
attachments
prompt_attachments

I'm going to call the new table contexts - I think prompts is confusing as those are already stored as part of responses and I like that it reflects the noun context which is key to thinking about how to use LLMs.

simonw · 2024-11-08T02:34:37Z

How should these affect the existing templates feature?

One option: templates work exactly as they do right now, so when you do llm -f prompt.txt --save foo a foo.yaml template is created with the contents of that file.

But... this actually brings up the point that templates right now are pretty inefficient. Using a template 100 times will drop 100 copies of that template into the responses table, one for each response!

Fixing this is probably a separate issue. I could start having templates automatically use this mechanism - being treated effectively as piped in / -f prompts and system prompts. But... that won't work for templates that include replacement params.

I could make it so replacement params work against these -u and -f inputs, and store which params were used for a given prompt. I'd need to add a params optional JSON column to responses for this.

I can do that as a separate piece of work.

simonw · 2024-11-08T02:44:58Z

If I'm doing this - supporting longer prompts using --file and --url - I could implement it such that IF one of those things is provided then the prompt is automatically stashed in the contexts table.

But as I started prototyping that out I realized that an alternative rule could simple be that any prompts over the length of X characters are automatically stored in that table instead.

This also isn't a question I can get wrong. If I decide to make all prompts longer than 50 characters stored in contexts instead, and then later decide that was a bad idea, I can change the policy without breaking anything - the code I've written that knows how to read from either responses.prompt or the thing pointed to by responses.prompt_id will continue to work regardless.

simonw · 2024-11-08T02:46:15Z

... in which case the --file and --url options are a distraction. That feature exists already in the form of curl ... | llm and llm < prompt.txt - if the only reason I was adding those is as a signal that longer prompts should be stored independently I don't need the new features at all if I go with a "prompts longer than X" design instead.

simonw · 2024-11-08T02:46:45Z

Here's as far as I got prototyping the --file and --url options:

diff --git a/llm/cli.py b/llm/cli.py
index 6a6fb2c..0f5f112 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -147,6 +147,16 @@ def cli():
 @click.argument("prompt", required=False)
 @click.option("-s", "--system", help="System prompt to use")
 @click.option("model_id", "-m", "--model", help="Model to use")
+@click.option("file", "-f", "--file", type=click.File(), help="Read prompt from file")
+@click.option("url", "-u", "--url", help="Read prompt from URL")
+@click.option(
+    "system_file",
+    "--sf",
+    "--system-file",
+    type=click.File(),
+    help="Read system prompt from file",
+)
+@click.option("system_url", "--su", "--system-url", help="Read system prompt from URL")
 @click.option(
     "attachments",
     "-a",
@@ -203,6 +213,10 @@ def prompt(
     prompt,
     system,
     model_id,
+    file,
+    url,
+    system_file,
+    system_url,
     attachments,
     attachment_types,
     options,
@@ -245,6 +259,16 @@ def prompt(
     def read_prompt():
         nonlocal prompt
 
+        prompt_bits = []
+
+        # Is there a file to read from?
+        if file:
+            prompt_bits.append(file.read())
+        if url:
+            response = httpx.get(url)
+            response.raise_for_status()
+            prompt_bits.append(response.text)
+
         # Is there extra prompt available on stdin?
         stdin_prompt = None
         if not sys.stdin.isatty():
@@ -258,10 +282,12 @@ def prompt(
 
         if (
             prompt is None
-            and not save
+            and file is None
+            and url is None
             and sys.stdin.isatty()
             and not attachments
             and not attachment_types
+            and not save
         ):
             # Hang waiting for input to stdin (unless --save)
             prompt = sys.stdin.read()

simonw · 2024-11-08T02:51:52Z

I think I need to change this code that populates the responses table:

llm/llm/models.py

Lines 221 to 238 in 5d1d723

    
           response = { 
        
               "id": response_id, 
        
               "model": self.model.model_id, 
        
               "prompt": self.prompt.prompt, 
        
               "system": self.prompt.system, 
        
               "prompt_json": self._prompt_json, 
        
               "options_json": { 
        
                   key: value 
        
                   for key, value in dict(self.prompt.options).items() 
        
                   if value is not None 
        
               }, 
        
               "response": self.text(), 
        
               "response_json": self.json(), 
        
               "conversation_id": conversation.id, 
        
               "duration_ms": self.duration_ms(), 
        
               "datetime_utc": self.datetime_utc(), 
        
           } 
        
           db["responses"].insert(response)

And code that reads from that table in two places:

llm/llm/cli.py

Lines 685 to 706 in 5d1d723

    
           LOGS_COLUMNS = """    responses.id, 
        
               responses.model, 
        
               responses.prompt, 
        
               responses.system, 
        
               responses.prompt_json, 
        
               responses.options_json, 
        
               responses.response, 
        
               responses.response_json, 
        
               responses.conversation_id, 
        
               responses.duration_ms, 
        
               responses.datetime_utc, 
        
               conversations.name as conversation_name, 
        
               conversations.model as conversation_model""" 
        
           LOGS_SQL = """ 
        
           select 
        
           {columns} 
        
           from 
        
               responses 
        
           left join conversations on responses.conversation_id = conversations.id{extra_where} 
        
           order by responses.id desc{limit} 
        
           """

llm/llm/models.py

Lines 284 to 319 in 5d1d723

    
               @classmethod 
        
               def from_row(cls, db, row): 
        
                   from llm import get_model 
        
                   model = get_model(row["model"]) 
        
                   response = cls( 
        
                       model=model, 
        
                       prompt=Prompt( 
        
                           prompt=row["prompt"], 
        
                           model=model, 
        
                           attachments=[], 
        
                           system=row["system"], 
        
                           options=model.Options(**json.loads(row["options_json"])), 
        
                       ), 
        
                       stream=False, 
        
                   ) 
        
                   response.id = row["id"] 
        
                   response._prompt_json = json.loads(row["prompt_json"] or "null") 
        
                   response.response_json = json.loads(row["response_json"] or "null") 
        
                   response._done = True 
        
                   response._chunks = [row["response"]] 
        
                   # Attachments 
        
                   response.attachments = [ 
        
                       Attachment.from_row(arow) 
        
                       for arow in db.query( 
        
                           """ 
        
                           select attachments.* from attachments 
        
                           join prompt_attachments on attachments.id = prompt_attachments.attachment_id 
        
                           where prompt_attachments.response_id = ? 
        
                           order by prompt_attachments."order" 
        
                       """, 
        
                           [row["id"]], 
        
                       ) 
        
                   ] 
        
                   return response

simonw · 2024-11-08T02:53:11Z

I could even define a new responses_full SQL view that inflates the foreign key system and regular prompts automatically, for developer convenience. It could extract out the attachments as JSON too.

simonw · 2024-11-08T03:26:53Z

But what do I do about the prompt_json columns storing huge copies of the strings anyway?

I worked up this little system with the help of ChatGPT Code Interpreter: https://chatgpt.com/share/672d84c0-7360-8006-8333-f7e743237942

def apply_replacements(obj, replacements):
    if isinstance(obj, dict):
        return {k: apply_replacements(v, replacements) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [apply_replacements(item, replacements) for item in obj]
    elif isinstance(obj, str):
        replaced_parts = []
        last_index = 0
        found = False

        for key, value in replacements.items():
            index = obj.find(key)
            while index != -1:
                found = True
                if index > last_index:
                    replaced_parts.append(obj[last_index:index])
                replaced_parts.append(value)
                last_index = index + len(key)
                index = obj.find(key, last_index)

        if found:
            if last_index < len(obj):
                replaced_parts.append(obj[last_index:])
            return {"$r": replaced_parts}
        else:
            return obj
    else:
        return obj


def reverse_replacements(obj, replacements):
    return _reverse_replacements(obj, {v: k for k, v in replacements.items()})


def _reverse_replacements(obj, replacements):
    if isinstance(obj, dict):
        if "$r" in obj:
            # Reconstruct the original string from the list
            return "".join(
                (replacements[part] if isinstance(part, int) else part)
                for part in obj["$r"]
            )
        else:
            return {k: _reverse_replacements(v, replacements) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [_reverse_replacements(item, replacements) for item in obj]
    else:
        return obj

So now you can do this:

from llm.utils import apply_replacements, reverse_replacements
replacements = {"This is a pretty long string at this point": 1, "And so is this one": 2}
json_object = {
    "foo": {
        "bar": {
            "baz": "this includes This is a pretty long string at this point",
            "qux": [44, "And so is this one"],
            "quux": "This has This is a pretty long string at this point and And so is this one as well"
        }
    }
}

replaced = apply_replacements(json_object, replacements)
orig = reverse_replacements(replaced, replacements)

And the replaced version looks like this:

{'foo': {'bar': {'baz': {'$r': ['this includes ', 1]},
                 'quux': {'$r': ['This has ', 1, ' and ', 2, ' as well']},
                 'qux': [44, {'$r': [2]}]}}}

simonw · 2024-11-09T05:57:57Z

The last remaining challenge is search. The system currently has a responses_fts table that indexes the prompt and response columns from responses:

llm/docs/logging.md

Lines 164 to 168 in 5d1d723

    
           CREATE VIRTUAL TABLE [responses_fts] USING FTS5 ( 
        
             [prompt], 
        
             [response], 
        
             content=[responses] 
        
           );

That design doesn't work if some of the prompts are in responses.prompt and others are in the new contexts table and related by a foreign key instead.

I could probably denormalize this, but that seems like it could be wasteful since the whole point of this exercise is to avoid storing the same long prompts hundreds of times... but if we store hundreds of copies in the FTS index table have we really improved things?

So I likely need a separate contexts_fts table... but now I'm stuck trying to do relevance scoring across two different FTS tables, which doesn't really work.

I think I'll have to accept a reduction in relevance scoring because of this. That's OK - the search feature isn't really a signature big deal, it's more of a convenience. I don't think it matters too much if the relevance scoring is a bit out of whack.

simonw · 2024-11-09T05:58:36Z

Note that response is always stored separately in the responses table - it's only prompt (of the two columns in the responses_fts table) that is affected by this new table design.

simonw · 2024-11-09T06:30:26Z

I think the migrations I added broke response_fts entirely, maybe by rewriting the rowid values when I ran .transform() to move some table columns around such that I need a new migration to re-index the table entirely.

simonw · 2024-11-13T03:10:06Z

Pushed my WIP to a branch: b2fce50

simonw · 2024-11-13T03:13:21Z

I'm second-guessing the design for this feature again now.

As implemented this kicks in by magic if the prompt is longer than a certain threshold:

llm/llm/models.py

Lines 230 to 236 in b2fce50

    
           for context, column in ( 
        
               (prompt, "prompt"), 
        
               (system, "system"), 
        
           ): 
        
               if context is not None and len(context) > PROMPT_THRESHOLD: 
        
                   hash = hashlib.sha256(context.encode("utf-8")).hexdigest() 
        
                   rows = list(db.query("select id from contexts where hash = ?", [hash]))

This assumes that prompts and system prompts will be entirely duplicated occasionally.

But maybe that's not the smartest way to do this. What if instead these reusable contexts could be specified by the user directly and could even be concatenated together?

Imagine being able to do something like this:

llm -f readme.md -f docs/usage.md 'How do I install this?'

Here we are concatenating two files together. Those files could be stored in two records in contexts, which would allow them to be combined in different ways in the future while still avoiding duplicating storage for them.

simonw · 2024-11-15T19:23:49Z

Relevant: https://twitter.com/alexalbert__/status/1857457290917589509

You can now access all of our docs concatenated as a single plain text file that can be fed in to any LLM.

Here's the url route: http://docs.anthropic.com/llms-full.txt

simonw · 2024-11-15T19:24:07Z

... if I DO go the -f route I think it should accept both file paths and URLs, because that's what the -a option does already:

llm -m claude-3.5-sonnet -f http://docs.anthropic.com/llms-full.txt 'how do I request a stream with curl?'

Note that http://docs.anthropic.com/llms-full.txt is a redirect to the https version, so following redirects is important - I just tried this:

curl 'http://docs.anthropic.com/llms-full.txt' | llm -m claude-3-5-haiku-latest 'a haiku about this document'

And got:

Here's a haiku about this HTTP 301 Moved Permanently response:

Cloudflare whispers soft
Redirected web page falls
Elsewhere, silently

The haiku captures the essence of the 301 status code - a gentle, permanent redirection signaled by Cloudflare, with the page moving to a new location in a quiet, understated manner.

simonw · 2024-11-16T19:41:27Z

I am going to call these fragments. The CLI option will be -f/--fragment and will accept a file Path or a URL or a -.

I am going to expose these in the Python API as well - as a prompt.fragments list of strings.

Open question: how about letting fragments be specified by ID as well? That way users could store them deliberately in the database.

simonw · 2024-11-16T19:45:51Z

There is definitely something interesting about being able to run a prompt that references multiple fragments. Imagine something like this:

llm -f datasette -f llm "ideas for Datasette plugins powered by LLM"

Where datasette is a fragment ID for a copy of the Datasette docs and llm is a fragment ID for the LLM docs.

How to tell the difference between -f datasette meaning the Datasette stored fragment and meaning a file on disk called datasette? Fragment ID wins, if you want the file instead use -f ./datasette instead to clarify.

simonw · 2024-11-16T19:47:08Z

The only reason to expose fragments in the Python prompt.fragments = [str] API is the code for persisting a prompt and response to a database knows to treat them specially.

simonw · 2024-11-16T19:52:23Z

This design implies the need for a family of CLI commands for managing fragments and their aliases. Maybe:

# list (truncated) stored fragments
llm fragments list
# show one (takes alias, ID or hash)
llm fragments show ALIAS
# search them
llm fragments list -q LLM
# add one with an alias
llm fragments set llm ./llm.txt

Can you remove fragments? Doing so could break existing DB records that rely on them - but maybe there's private information in them that you didn't mean to save?

I thank you can remove them, and doing so will cause those affected records to show that the fragments have been removed while still retuning the logged response.

# Remove stored fragment
llm fragments remove llm

simonw · 2024-11-16T19:59:19Z

So the fragments table has:

id - numeric ID used in M2M tables
hash - string hash used for deduping
content string content
alias - null allowed unique indexed alias string

I might have a created date time on there too. And maybe even a source URL or filename.

simonw · 2024-11-16T20:02:26Z

One thing this new design does not take into account is system prompts.

Maybe --sf X/--system-fragment X?

Would also need a prompt.system_fragments list of strings - and that's a bit weird because of the thing where some models don't let you change system prompt half way through a conversation (a problem we have with prompt.system already so at least it's not a new problem).

simonw · 2024-11-17T20:18:29Z

Made a start on this here - I've implemented the schema and the code such that prompts are aware of it, but I'm not yet persisting the fragments to the new database tables.

llm/llm/models.py

Lines 90 to 133 in 6c355c1

    
           @dataclass 
        
           class Prompt: 
        
               _prompt: str 
        
               model: "Model" 
        
               fragments: Optional[List[str]] 
        
               attachments: Optional[List[Attachment]] 
        
               _system: Optional[str] 
        
               system_fragments: Optional[List[str]] 
        
               prompt_json: Optional[str] 
        
               options: "Options" 
        
               def __init__( 
        
                   self, 
        
                   prompt, 
        
                   model, 
        
                   *, 
        
                   fragments=None, 
        
                   attachments=None, 
        
                   system=None, 
        
                   system_fragments=None, 
        
                   prompt_json=None, 
        
                   options=None, 
        
               ): 
        
                   self._prompt = prompt 
        
                   self.model = model 
        
                   self.attachments = list(attachments or []) 
        
                   self.fragments = fragments or [] 
        
                   self._system = system 
        
                   self.system_fragments = system_fragments or [] 
        
                   self.prompt_json = prompt_json 
        
                   self.options = options or {} 
        
               @property 
        
               def prompt(self): 
        
                   return "\n".join(self.fragments + [self._prompt]) 
        
               @property 
        
               def system(self): 
        
                   bits = [ 
        
                       bit.strip() 
        
                       for bit in (self.system_fragments + [self._system or ""]) 
        
                       if bit.strip() 
        
                   ] 
        
                   return "\n\n".join(bits)

simonw · 2024-11-18T04:18:46Z

A slight oddity with this design for aliases: if you set the alias of a piece of content to one thing, then set a different alias for the same byte-for-byte piece of content from elsewhere, the original alias will be forgotten.

That feels confusing, and doesn't match with how aliases work for models where one model can have multiple aliases.

Potential solutions:

Do nothing, accept this - it's a rare edge case
Warn the user if they attempt to use llm fragments set ... on content that has an alias according to its hash already
Change the DB schema - instead of alias being a nullable unique column on the fragments table it should instead be a separate fragment_aliases table which can assign multiple aliases to a single hash

That 3rd option is the least surprising in my opinion. It also removes the slightly weird unique-but-nullable alias column from the fragments table.

simonw · 2024-11-18T04:21:14Z

db["fragment_aliases"].create(
    {
        "alias": str,
        "fragment_id": int,
    },
    foreign_keys=("fragment_id", "fragments", "id"),
    pk=("alias",),
)

The primary key IS the alias, since you can't have multiple aliases with the same name.

simonw · 2024-11-18T04:50:03Z

I think removing an alias should use garbage collection rules: if the aliased fragment is referenced by at least one other alias OR is referenced in the prompt_fragments or system_fragments table it is left alone, if no references to it remain then it is deleted.

simonw · 2024-11-18T05:00:33Z

I'm not going to bother implementing garbage collection for the moment.

Refs #617

simonw · 2024-11-18T05:18:35Z

This is now mostly working. The prompt_json column is a problem though - it's storing a duplicate of the whole prompt when the point of this exercise is to cut down on how much duplicated junk we save in that table.

taygetea · 2024-11-18T06:08:17Z

i see you just touched this but for what its worth a few days ago i tried to add a text file as an attachment assuming it would work and was surprised it didnt. was the mental model i had. so this is good.

sivynava · 2025-01-15T04:16:04Z

Not sure if this will get attention for a while, but another use case I've found for text/plain or text/csv is attaching csv files as lookups for other tasks.

I can attach two files in claude.ai, including a csv file, and use the csv file as a lookup table for the other data in the prompt.
but I can't --at my.csv "text/csv"

CLAUDE="claude-3-5-sonnet-latest"
GEMINI="gemini-1.5-flash-latest"

The command:

llm -m $CLAUDE \
    --at MyImage.png "image/png" \
    --at lookup.csv "text/csv" > output.csv

I get:

Error: This model does not support attachments of type 'text/csv', only image/gif, image/png, application/pdf, image/jpeg, image/webp

Switching to Gemini:

llm -m $(GEMINI) \
    --at MyImage.png "image/png" \
    --at lookup.csv "text/csv" > output.csv

I get:

Error: This model does not support attachments of type 'text/csv', only application/pdf, image/png, image/jpeg, image/webp, image/heic, image/heif, audio/wav, audio/mp3, audio/aiff, audio/aac, audio/ogg, audio/flac, audio/mpeg, video/mp4, video/mpeg, video/mov, video/avi, video/x-flv, video/mpg, video/webm, video/wmv, video/3gpp, video/quicktime

simonw added the enhancement New feature or request label Nov 6, 2024

simonw changed the title ~~Idea: text attachments that are pasted directly into the prompt~~ Reusable contexts that get persisted in a de-duplicated manner Nov 8, 2024

simonw changed the title ~~Reusable contexts that get persisted in a de-duplicated manner~~ Store prompts and system prompts longer than X characters in de-duped table Nov 8, 2024

simonw added a commit that referenced this issue Nov 13, 2024

WIP contexts table feature, refs #617

b2fce50

simonw changed the title ~~Store prompts and system prompts longer than X characters in de-duped table~~ Option for passing in longer context fragments, stored in a deduped table Nov 16, 2024

simonw pinned this issue Nov 17, 2024

simonw added a commit that referenced this issue Nov 17, 2024

WIP fragments: schema plus reading but not yet writing, refs #617

6c355c1

simonw added a commit that referenced this issue Nov 18, 2024

Unique index on fragments.alias, refs #617

c07ba03

simonw added a commit that referenced this issue Nov 18, 2024

Fragments are now persisted, added basic CLI commands

84f8363

Refs #617

simonw mentioned this issue Nov 18, 2024

WIP: Fragments #638

Draft

10 tasks

simonw added a commit that referenced this issue Nov 18, 2024

Fragment aliases work now, refs #617

93e8f44

simonw mentioned this issue Feb 2, 2025

Add prompt caching simonw/llm-claude-3#14

Closed

Option for passing in longer context fragments, stored in a deduped table #617

Option for passing in longer context fragments, stored in a deduped table #617

Comments

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024

simonw commented Nov 7, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024

simonw commented Nov 8, 2024 • edited Loading

simonw commented Nov 8, 2024 • edited Loading

simonw commented Nov 8, 2024 • edited Loading

simonw commented Nov 8, 2024

simonw commented Nov 9, 2024

simonw commented Nov 9, 2024

simonw commented Nov 9, 2024

simonw commented Nov 13, 2024

simonw commented Nov 13, 2024

simonw commented Nov 15, 2024

simonw commented Nov 15, 2024 • edited Loading

simonw commented Nov 16, 2024 • edited Loading

simonw commented Nov 16, 2024

simonw commented Nov 16, 2024

simonw commented Nov 16, 2024 • edited Loading

simonw commented Nov 16, 2024 • edited Loading

simonw commented Nov 16, 2024

simonw commented Nov 17, 2024

simonw commented Nov 18, 2024 • edited Loading

simonw commented Nov 18, 2024

simonw commented Nov 18, 2024

simonw commented Nov 18, 2024

simonw commented Nov 18, 2024

taygetea commented Nov 18, 2024

sivynava commented Jan 15, 2025

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 8, 2024 •

edited

Loading

simonw commented Nov 8, 2024 •

edited

Loading

simonw commented Nov 8, 2024 •

edited

Loading

simonw commented Nov 15, 2024 •

edited

Loading

simonw commented Nov 16, 2024 •

edited

Loading

simonw commented Nov 16, 2024 •

edited

Loading

simonw commented Nov 16, 2024 •

edited

Loading

simonw commented Nov 18, 2024 •

edited

Loading