We expect that you have configured the environment variables required for the LLM you are attempting to use. For example:
- OpenAI service requires:
OPENAI_API_KEY=my-secret-api-key-value
- IBM BAM service requires:
GENAI_KEY=my-secret-api-key-value
The development team has been using the IBM BAM service to aid development and testing:
IBM Big AI Model (BAM) laboratory is where IBM Research designs, builds, and iterates on what’s next in foundation models. Our goal is to help accelerate the transition from research to product. Come experiment with us.
Warning
In order to use this service an individual needs to obtain a w3id from IBM. The kai development team is unable to help obtaining this access.
- Login to https://bam.res.ibm.com/.
- To access via an API you can look at ‘Documentation’ after logging into https://bam.res.ibm.com/. You will see a field embedded in the 'Documentation' section where you can generate/obtain an API Key.
- Ensure you have exported the key via
export GENAI_KEY=my-secret-api-key-value
.
Related client tooling:
If you have a valid API Key for OpenAI you may use this with Kai.
- Follow the directions from OpenAI here.
- Ensure you have exported the key via
export OPENAI_API_KEY=my-secret-api-key-value
We offer configuration choices of several models via config.toml which line up to choices we know about from kai/model_provider.py.
To change which llm you are targeting, open config.toml
and change the
[models]
section to one of the following:
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "ibm/granite-13b-chat-v2"
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "mistralai/mixtral-8x7b-instruct-v01"
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "meta-llama/llama-2-13b-chat"
# Note: llama3 complains if we use more than 2048 tokens
# See: https://github.com/konveyor-ecosystem/kai/issues/172
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "meta-llama/llama-3-70b-instruct"
parameters.max_new_tokens = 2048
[models]
provider = "ChatOllama"
[models.args]
model = "mistral"
[models]
provider = "ChatOpenAI"
[models.args]
model = "gpt-4"
[models]
provider = "ChatOpenAI"
[models.args]
model = "gpt-3.5-turbo"
In general Kai will work with OpenAI Compatible API alternatives. Two examples
are Podman Desktop and Oobabooga Text generation web UI. Once your alternative
is installed all that is necessary is to export OPENAI_API_BASE
in addition to
OPENAI_API_KEY
.
Installation will vary depending on your operating system and distribution and is documented on the Podman Desktop website.
https://podman-desktop.io/docs/installation
- Start Podman Desktop
- Navigate to the Extensions
- Select the Catalog
- Search for
Podman AI Lab
- Install the
Podman AI Lab
Extension - Navigate to the AI Lab
- Under Models select Catalog
- Download one or more models
- Navigate to Services
- Click
New Model Service
- Select a model to serve and click Create Service
- On the Service details page note the server URL to use with Kai
- Export the URL, for example
export OPENAI_API_BASE="http://localhost:35841/v1"
- Note that the Podman Desktop service endpoint is not passworded, but the
OpenAI library expects
OPENAI_API_KEY
to be set. In this case the value does not matter. - Adjust your config.toml settings if necessary
[models]
provider = "ChatOpenAI"
[models.args]
model = "mistral-7b-instruct-v0-2"
- OpenShift AI also provides an OpenAI compatible API with vLLM
- The vLLM runtime can be added to your cluster if not already available by following these instructions
- Export the URL, for example
export OPENAI_API_BASE=https://mistralaimistral-7b-instruct-v02-kyma-workshop.apps.cluster.example.com/v1"
- When vLLM serves models it does so from the
/mnt/models/
directory in the container, and this is where the model name is taken from, so in all cases use '/mnt/models/` for the model name. - Adjust your config.toml
[models]
provider = "ChatOpenAI"
[models.args]
model = "/mnt/models/"
We have experienced problems due to the model context being too short for our inputs with some models. It is currently possibly, though somewhat difficult to workaround this issue.