-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
🚀 Describe the new functionality needed
Configuration API
adding providers outside of the current scope will likely necessitate the following:
- bespoke configuration based on hardware (GPU, CPU, etc) that should apply to multiple providers in order for them to work properly.
- hyperparameters for specific providers that should be both auto-detected and able to be selected using a CLI.
- a way to check available configurations, currently assigned configurations, etc.
I imagine this functionality working similarly to Models
or Inspect
where these are a high level API. Additionally these objects should be applicable for other providers to "register" one of them. Configurations
similarly to models, should operate as an "overarching" API that one can register, list, get, and unregister a configuration.
usage pattern:
llama stack build && llama stack run (administrator starts stack)
a user could run:
llama-stack-client configurations inspect
providers:
agents:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
datasetio: []
eval: []
inference:
- config:
url: http://localhost:12345
provider_id: ollama
provider_type: remote::ollama
safety: []
scoring:
- config: {}
provider_id: braintrust
provider_type: inline::braintrust
telemetry:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
tool_runtime:
- config: {}
provider_id: brave-search
provider_type: remote::brave-search
- config: {}
provider_id: tavily-search
provider_type: remote::tavily-search
vector_io:
- config: {}
provider_id: faiss
provider_type: inline::faiss
- config: {}
provider_id: sqlite_vec
provider_type: inline::sqlite_vec
llama-stack-client configurations register --config <file_path>
or using the SDK:
current_config = client.configurations.inspect()
print(current_config)
config = { "inference": [{'provider_id': 'ollama', 'provider_type': 'remote::ollama', 'config': {'url': 'http://localhost:12345'}}]}
config = json.dumps(config)
config = client.configurations.register(config=config)
print(config)
the configuration API would look something like:
@json_schema_type
class Configuration(BaseModel):
type: Literal[ResourceType.configuration.value] = ResourceType.configuration.value
config: StackRunConfig
class ConfigListResponse(BaseModel):
data: List[dict[str, Any]]
@runtime_checkable
@trace_protocol
class Configurations(Protocol):
"""Llama Stack Configuration API for storing and applying hyperparameters for given tasks.
"""
@webmethod(route="/configurations/register", method="POST")
async def register_config(
self,
config,
) -> dict[str, Any]: ...
With the inspect API expanded to have a /configurations endpoint:
@runtime_checkable
class Inspect(Protocol):
@webmethod(route="/inspect/configurations", method="GET")
async def inspect_config(
self,
) -> InspectConfigResponse: ...
UserConfig vs StackRunConfig
A key part of this API are the fields exposed in both the inspection and registration. A Configuration object contains a StackRunConfig
within it. However, the data within this config is a UserConfig
. A UserConfig is a StackRunConfig
but only with specific fields displayed to the user. Since each provider has its own config class that feeds into the StackRunConfig the following can be used to label certain fields as "User Configurable":
url: str = Field(DEFAULT_OLLAMA_URL, json_schema_extra={"user_field": True})
the pydantic json_schema_extra
field can then be used when creating a Configuration
object to create an intermediary UserConfig
. The User Config will only have fields labeled as user_field meaning that if a user tries to register a configuration with non-user fields specified, they will be dropped, and an inspected configuration will only contain user fields for viewing as well. In the above example the url
is the only field given the user_field
schema which is why it is one of the few things showing up.
Server Side Device Discovery for Initial Configuration
Before a user can inspect or register a config of their own, it would make sense to allow providers to utilize a centralized hardware discovery service built into llama-stack. Providers could then act on this information inside of their configuration initialization methods to apply certain defaults depending on the hardware discovered as opposed to a blanket set of defaults.
💡 Why is this needed? What if we don't build it?
Without a system like the above, it will be difficult to orchestrate a sequence of providers intended to "work together" or even a single complex provider to be easily accessible to users. Additionally, the more complex APIs and providers that are introduced, the greater odds runtime manipulation of key configuration fields will be necessary.
Say someone provides a data generation, training, and evaluation methodology as separate providers, and each of these depends on specific hardware requirements, hyper parameters, etc to interact with one another and these parameters change per hardware (H100 vs A100 vs L40).
Exposing the current provider configuration to a user will help them understand what they will be running for various providers as functionality gets more complex (SDG, Evals, Training, etc). Additionally, allowing a user to apply parts of a config on top of a running stack as opposed to taking the stack down and having the admin apply a full run config again seems like a more sustainable workflow.
Other thoughts
I would like to work on this in collaboration with anyone if possible!