Version 0.27.0

Add thinking option to control the amount of thinking that happens for reasoning models.
Fix incorrectly low default Claude max tokens
Fix Claude extraction of text and reasoning results when reasoning

Version 0.26.1

Add Claude 4 models
Fix error using Open AI for batch embeddings
Add streaming tool calls for Ollama
Fix Ollama tool-use booleans

Version 0.26.0

Call tools with nil when called with false JSON values.
Fix bug in ollama batch embedding generation.
Add Qwen 3 and Gemma 3 to model list.
Fix broken model error message
Fix reasoning model and streaming incompatibility

Version 0.25.0

Add llm-ollama-authed provider, which is like Ollama but takes a key.
Set Gemini 2.5 Pro to be the default Gemini model
Fix llm-batch-embeddings-async so it returns all embeddings
Add Open AI 4.1, o3, Gemini 2.5 Flash

Version 0.24.2

Fix issue with some Open AI compatible providers needing models to be passed by giving a non-nil default.
Add Gemini 2.5 Pro
Fix issue with JSON return specs which pass booleans

Version 0.24.1

Fix issue with Ollama incorrect requests when passing non-standard params.

Version 0.24.0

Add multi-output as an option, allowing all llm results to return, call, or stream multiple kinds of data via a plist. This allows separating out reasoning, as well as optionally returning text as well as tool uses at the same time.
Added llm-models to get a list of models from a provider.
Fix misnamed llm-capabilities output to refer to tool-use and streaming-tool-use (which is new).
Fixed Claude streaming tool use (via Paul Nelson)
Added Deepseek service
Add Gemini 2.0 pro experimental model, default to 2.0 flash
Add Open AI’s o3 mini model
Add Claude 3.7 sonnet
Fix Claude’s capabilities to reflect that it can use tools
Added ability to set keep_alive option for Ollama correctly.

Version 0.23.0

Add GitHub’s GitHub Models
Accept lists as nonstandard
Add Deepseek R1 model
Show the chat model as the name for Open-AI compatible models (via @whhone)

Version 0.22.0

Change llm-tool-function to llm-tool, change make-llm-tool-function to take any arguments.

Version 0.21.0

Incompatible change to function calling, which is now tool use, affecting arguments and methods.
Support image understanding in Claude
Support streaming tool use in Claude
Add llm-models-add as a convenience method to add a model to the known list.

Version 0.20.0

Add ability to output according to a JSON spec.
Add Gemini 2.0 Flash, Gemini 2.0 Flash Thinking, and Llama 3.3 and QwQ models.

Version 0.19.1

Fix Open AI context length sizes, which are mostly smaller than advertised.

Version 0.19.0

Add JSON mode, for most providers with the exception of Claude.
Add ability for keys to be functions, thanks to Daniel Mendler.

Version 0.18.1

Fix extra argument in llm-batch-embeddings-async.

Version 0.18.0

Add media handling, for images, videos, and audio.
Add batch embeddings capability (currently for just Open AI and Ollama).
Add Microsoft Azure’s Open AI
Remove testing and other development files from ELPA packaging.
Remove vendored plz-event-source and plz-media-type, and add requirements.
Update list of Ollama models for function calling.
Centralize model list so things like Vertex and Open AI compatible libraries can have more accurate context lengths and capabilities.
Update default Gemini chat model to Gemini 1.5 Pro.
Update default Claude chat model to latest Sonnet version.
Fix issue in some Open AI compatible providers with empty function call arguments

Version 0.17.4

Fix problem with Open AI’s llm-chat-token-limit.
Fix Open AI and Gemini’s parallel function calling.
Add variable llm-prompt-default-max-tokens to put a cap on number of tokens regardless of model size.

Version 0.17.3

More fixes with Claude and Ollama function calling conversation, thanks to Paul Nelson.
Make llm-chat-streaming-to-point more efficient, just inserting new text, thanks to Paul Nelson.
Don’t output streaming information when llm-debug is true, since it tended to be overwhelming.

Version 0.17.2

Fix compiled functions not being evaluated in llm-prompt.
Use Ollama’s new embed API instead of the obsolete one.
Fix Claude function calling conversations
Fix issue in Open AI streaming function calling.
Update Open AI and Claude default chat models to the later models.

Version 0.17.1

Support Ollama function calling, for models which support it.
Make sure every model, even unknown models, return some value for llm-chat-token-limit.
Add token count for llama3.1 model.
Make llm-capabilities work model-by-model for embeddings and functions

Version 0.17.0

Introduced llm-prompt for prompt management and creation from generators.
Removed Gemini and Vertex token counting, because llm-prompt uses token counting often and it’s best to have a quick estimate than a more expensive more accurate count.

Version 0.16.2

Fix Open AI’s gpt4-o context length, which is lower for most paying users than the max.

Version 0.16.1

Add support for HTTP / HTTPS proxies.

Version 0.16.0

Add “non-standard params” to set per-provider options.
Add default parameters for chat providers.

Version 0.15.0

Move to plz backend, which uses curl. This helps move this package to a stronger foundation backed by parsing to spec. Thanks to Roman Scherer for contributing the plz extensions that enable this, which are currently bundled in this package but will eventually become their own separate package.
Add model context information for Open AI’s GPT 4-o.
Add model context information for Gemini’s 1.5 models.

Version 0.14.2

Fix mangled copyright line (needed to get ELPA version unstuck).
Fix Vertex response handling bug.

Version 0.14.1

Fix various issues with the 0.14 release

Version 0.14

Introduce new way of creating prompts: llm-make-chat-prompt, deprecating the older ways.
Improve Vertex error handling

Version 0.13

Add Claude’s new support for function calling.
Refactor of providers to centralize embedding and chat logic.
Remove connection buffers after use.
Fixes to provider more specific error messages for most providers.

Verson 0.12.3

Refactor of warn-non-nonfree methods.
Add non-free warnings for Gemini and Claude.

Version 0.12.2

Send connection issues to error callbacks, and fix an error handling issue in Ollama.
Fix issue where, in some cases, streaming does not work the first time attempted.

Version 0.12.1

Fix issue in llm-ollama with not using provider host for sync embeddings.
Fix issue in llm-openai where were incompatible with some Open AI-compatible backends due to assumptions about inconsequential JSON details.

Version 0.12.0

Add provider llm-claude, for Anthropic’s Claude.

Version 0.11.0

Introduce function calling, now available only in Open AI and Gemini.
Introduce llm-capabilities, which returns a list of extra capabilities for each backend.
Fix issue with logging when we weren’t supposed to.

Version 0.10.0

Introduce llm logging (for help with developing against llm), set llm-log to non-nil to enable logging of all interactions with the llm package.
Change the default interaction with ollama to one more suited for converesations (thanks to Thomas Allen).

Version 0.9.1

Default to the new “text-embedding-3-small” model for Open AI. Important: Anyone who has stored embeddings should either regenerate embeddings (recommended) or hard-code the old embedding model (“text-embedding-ada-002”).
Fix response breaking when prompts run afoul of Gemini / Vertex’s safety checks.
Change Gemini streaming to be the correct URL. This doesn’t seem to have an effect on behavior.

Version 0.9

Add llm-chat-token-limit to find the token limit based on the model.
Add request timeout customization.

Version 0.8

Allow users to change the Open AI URL, to allow for proxies and other services that re-use the API.
Add llm-name and llm-cancel-request to the API.
Standardize handling of how context, examples and history are folded into llm-chat-prompt-interactions.

Version 0.7

Upgrade Google Cloud Vertex to Gemini - previous models are no longer available.
Added gemini provider, which is an alternate endpoint with alternate (and easier) authentication and setup compared to Cloud Vertex.
Provide default for llm-chat-async to fall back to streaming if not defined for a provider.

Version 0.6

Add provider llm-llamacpp.
Fix issue with Google Cloud Vertex not responding to messages with a system interaction.
Fix use of (pos-eol) which is not compatible with Emacs 28.1.

Version 0.5.2

Fix incompatibility with older Emacs introduced in Version 0.5.1.
Add support for Google Cloud Vertex model text-bison and variants.
llm-ollama can now be configured with a scheme (http vs https).

Version 0.5.1

Implement token counting for Google Cloud Vertex via their API.
Fix issue with Google Cloud Vertex erroring on multibyte strings.
Fix issue with small bits of missing text in Open AI and Ollama streaming chat.

Version 0.5

Fixes for conversation context storage, requiring clients to handle ongoing conversations slightly differently.
Fixes for proper sync request http error code handling.
llm-ollama can now be configured with a different hostname.
Callbacks now always attempts to be in the client’s original buffer.
Add provider llm-gpt4all.

Version 0.4

Add helper function llm-chat-streaming-to-point.
Add provider llm-ollama.

Version 0.3

Streaming support in the API, and for the Open AI and Vertex models.
Properly encode and decode in utf-8 so double-width or other character sizes don’t cause problems.

Version 0.2.1

Changes in how we make and listen to requests, in preparation for streaming functionality.
Fix overzealous change hook creation when using async llm requests.

Version 0.2

Remove the dependency on non-GNU request library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!