LLM plugin providing access to models running on an Ollama server.
Install this plugin in the same environment as LLM.
llm install llm-ollama
First, ensure that your Ollama server is running and that you have pulled some models. You can use ollama list
to check what is locally available.
The plugin will query the Ollama server for the list of models. You can use llm ollama list-models
to see the list; it should be the same as output by ollama list
. All these models will be automatically registered with LLM and made available for prompting, chatting, and embedding.
Assuming you have llama2:latest
available, you can run a prompt using:
llm -m llama2:latest 'How much is 2+2?'
The plugin automatically creates a short alias for models that have :latest
in the name, so the previous command is equivalent to running:
llm -m llama2 'How much is 2+2?'
To start an interactive chat session:
llm chat -m llama2
Chatting with llama2:latest
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
>
Multi-modal Ollama models can accept image attachments using the LLM attachments options:
llm -m llava "Describe this image" -a https://static.simonwillison.net/static/2024/pelicans.jpg
The plugin supports LLM embeddings. Both regular and specialized embedding models (such as mxbai-embed-large
) can be used:
llm embed -m mxbai-embed-large -i README.md
By default, the input will be truncated from the end to fit within the context length. This behavior can be changed by setting OLLAMA_EMBED_TRUNCATE=no
environment variable. In such case, embedding operation will fail if context length is exceeded.
The plugin registers async LLM models suitable for use with Python asyncio.
To utilize an async model, retrieve it using llm.get_async_model()
function instead of llm.get_model()
and then await the response:
import asyncio, llm
async def run():
model = llm.get_async_model("llama3.2:latest")
response = model.prompt("A short poem about tea")
print(await response.text())
asyncio.run(run())
The same Ollama model may be referred by several names with different tags. For example, in the following list, there is a single unique model with three different names:
ollama list
NAME ID SIZE MODIFIED
stable-code:3b aa5ab8afb862 1.6 GB 9 hours ago
stable-code:code aa5ab8afb862 1.6 GB 9 seconds ago
stable-code:latest aa5ab8afb862 1.6 GB 14 seconds ago
In such cases, the plugin will register a single model and create additional aliases. Continuing the previous example, this is what LLM will have:
llm models
...
Ollama: stable-code:3b (aliases: stable-code:code, stable-code:latest, stable-code)
All models accept Ollama modelfile parameters as options. Use the -o name value
syntax to specify them, for example:
-o temperature 0.8
: set the temperature of the model-o num_ctx 256000
: set the size of the context window used to generate the next token
See the referenced page for the complete list with descriptions and default values.
Additionally, the -o json_object 1
option can be used to force the model to reply with a valid JSON object. Note that your prompt must mention JSON for this to work.
llm-ollama
will try to connect to a server at the default localhost:11434
address. If your Ollama server is remote or runs on a non-default port, you can use OLLAMA_HOST
environment variable to point the plugin to it, e.g.:
export OLLAMA_HOST=https://192.168.1.13:11434
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-ollama
python3 -m venv venv
source venv/bin/activate
Now install the dependencies:
pip install -e '.[test,lint]'
To run the tests:
python -m pytest
To format the code:
python -m black .