Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-cpp-python: OpenAI compatible web server - Local Copilot replacement - Function Calling support - Vision API support #328

Open
irthomasthomas opened this issue Jan 11, 2024 · 0 comments
Labels
base-model llm base models not finetuned for chat github gh tools like cli, Actions, Issues, Pages llm-function-calling Function Calling with Large Language Models llm-inference-engines Software to run inference on large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models openai OpenAI APIs, LLMs, Recipes and Evals

Comments

@irthomasthomas
Copy link
Owner

Python Bindings for llama.cpp

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

  • Low-level access to C API via ctypes interface.
  • High-level Python API for text completion
  • OpenAI-like API
  • LangChain compatibility
  • OpenAI compatible web server
  • Local Copilot replacement
  • Function Calling support
  • Vision API support
  • Multiple Models

Documentation is available at https://llama-cpp-python.readthedocs.io/en/latest.

Installation

llama-cpp-python can be installed directly from PyPI as a source distribution by running:

pip install llama-cpp-python

This will build llama.cpp from source using cmake and your system's c compiler (required) and install the library alongside this python package.

If you run into issues during installation add the --verbose flag to the pip install command to see the full cmake build log.

Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc)

The default pip install behaviour is to build llama.cpp for CPU only on Linux and Windows and use Metal on MacOS.

llama.cpp supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal. See the llama.cpp README for a full list of supported backends.

All of these backends are supported by llama-cpp-python and can be enabled by setting the CMAKE_ARGS environment variable before installing.

On Linux and Mac you set the CMAKE_ARGS like this:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

On Windows you can set the CMAKE_ARGS like this:

$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python

OpenBLAS

To install with OpenBLAS, set the LLAMA_BLAS and LLAMA_BLAS_VENDOR environment variables before installing:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

cuBLAS

To install with cuBLAS, set the LLAMA_CUBLAS=1 environment variable before installing:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Metal

To install with Metal (MPS), set the LLAMA_METAL=on environment variable before installing:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

Suggested labels

{ "key": "llm-python-bindings", "value": "Python bindings for llama.cpp library" }

@irthomasthomas irthomasthomas added github gh tools like cli, Actions, Issues, Pages openai OpenAI APIs, LLMs, Recipes and Evals llm-function-calling Function Calling with Large Language Models pypi llm-inference-engines Software to run inference on large language models base-model llm base models not finetuned for chat llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models labels Jan 11, 2024
@irthomasthomas irthomasthomas changed the title llama-cpp-python: > - OpenAI compatible web server > - Local Copilot replacement > - Function Calling support > - Vision API support llama-cpp-python: OpenAI compatible web server - Local Copilot replacement - Function Calling support - Vision API support Jan 11, 2024
@ShellLM ShellLM removed the llama label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
base-model llm base models not finetuned for chat github gh tools like cli, Actions, Issues, Pages llm-function-calling Function Calling with Large Language Models llm-inference-engines Software to run inference on large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models openai OpenAI APIs, LLMs, Recipes and Evals
Projects
None yet
Development

No branches or pull requests

2 participants