0.17
Support for attachments, allowing multi-modal models to accept images, audio, video and other formats. #587
The default OpenAI gpt-4o
and gpt-4o-mini
models can both now be prompted with JPEG, GIF, PNG and WEBP images.
Attachments in the CLI can be URLs:
llm -m gpt-4o "describe this image" \
-a https://static.simonwillison.net/static/2024/pelicans.jpg
Or file paths:
llm -m gpt-4o-mini "extract text" -a image1.jpg -a image2.jpg
Or binary data, which may need to use --attachment-type
to specify the MIME type:
cat image | llm -m gpt-4o-mini "extract text" --attachment-type - image/jpeg
Attachments are also available in the Python API:
model = llm.get_model("gpt-4o-mini")
response = model.prompt(
"Describe these images",
attachments=[
llm.Attachment(path="pelican.jpg"),
llm.Attachment(url="https://static.simonwillison.net/static/2024/pelicans.jpg"),
]
)
Plugins that provide alternative models can support attachments, see Attachments for multi-modal models for details.
The latest llm-claude-3 plugin now supports attachments for Anthropic's Claude 3 and 3.5 models. The llm-gemini plugin supports attachments for Google's Gemini 1.5 models.
Also in this release: OpenAI models now record their "usage"
data in the database even when the response was streamed. These records can be viewed using llm logs --json
. #591