Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inferenceConfig to Playground #245

Merged
merged 1 commit into from
Jan 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,11 @@ spec:
source:
modelHub:
modelID: facebook/opt-125m
inferenceFlavors:
- name: t4 # GPU type
requests:
nvidia.com/gpu: 1
inferenceConfig:
flavors:
- name: default # Configure GPU type
requests:
nvidia.com/gpu: 1
```

#### Inference Playground
Expand Down
16 changes: 11 additions & 5 deletions api/core/v1alpha1/model_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,15 @@ type Flavor struct {
Params map[string]string `json:"params,omitempty"`
}

// InferenceConfig represents the inference configurations for the model.
type InferenceConfig struct {
// Flavors represents the accelerator requirements to serve the model.
// Flavors are fungible following the priority represented by the slice order.
// +kubebuilder:validation:MaxItems=8
// +optional
Flavors []Flavor `json:"flavors,omitempty"`
}

type ModelName string

// ModelClaim represents claiming for one model, it's the standard claimMode
Expand Down Expand Up @@ -188,11 +197,8 @@ type ModelSpec struct {
// Source represents the source of the model, there're several ways to load
// the model such as loading from huggingface, OCI registry, s3, host path and so on.
Source ModelSource `json:"source"`
// InferenceFlavors represents the accelerator requirements to serve the model.
// Flavors are fungible following the priority represented by the slice order.
// +kubebuilder:validation:MaxItems=8
// +optional
InferenceFlavors []Flavor `json:"inferenceFlavors,omitempty"`
// InferenceConfig represents the inference configurations for the model.
InferenceConfig *InferenceConfig `json:"inferenceConfig,omitempty"`
}

const (
Expand Down
32 changes: 26 additions & 6 deletions api/core/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions api/inference/v1alpha1/backendruntime_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
// do not change the name.
type BackendRuntimeArg struct {
// Name represents the identifier of the backendRuntime argument.
// +kubebuilder:default=default
Name string `json:"name"`
// Flags represents all the preset configurations.
// Flag around with {{ .CONFIG }} is a configuration waiting for render.
Expand Down
14 changes: 3 additions & 11 deletions api/inference/v1alpha1/config_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,9 @@ type BackendRuntimeConfig struct {
// from the default version.
// +optional
Version *string `json:"version,omitempty"`
// ArgName represents the argument name set in the backendRuntimeArg.
// If not set, will be derived by the model role, e.g. if one model's role
// is <draft>, the argName will be set to <speculative-decoding>. Better to
// set the argName explicitly.
// By default, the argName will be treated as <default> in runtime.
// +optional
ArgName *string `json:"argName,omitempty"`
// ArgFlags represents the argument flags appended to the backend.
// You can add new flags or overwrite the default flags.
// +optional
ArgFlags []string `json:"argFlags,omitempty"`
// Args represents the specified arguments of the backendRuntime,
// will be append to the backendRuntime.spec.Args.
Args *BackendRuntimeArg `json:"args,omitempty"`
// Envs represents the environments set to the container.
// +optional
Envs []corev1.EnvVar `json:"envs,omitempty"`
Expand Down
28 changes: 19 additions & 9 deletions api/inference/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 6 additions & 6 deletions client-go/applyconfiguration/core/v1alpha1/flavor.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions client-go/applyconfiguration/core/v1alpha1/inferenceconfig.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions client-go/applyconfiguration/core/v1alpha1/modelclaim.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions client-go/applyconfiguration/core/v1alpha1/modelrefer.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 10 additions & 15 deletions client-go/applyconfiguration/core/v1alpha1/modelspec.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading