Skip to content
This repository was archived by the owner on Oct 31, 2025. It is now read-only.
This repository was archived by the owner on Oct 31, 2025. It is now read-only.

Rework config to be more native / allow customization #135

@owtaylor

Description

@owtaylor

[ Comment moved from https://github.com//issues/119#issuecomment-2902226353 ]

Use cases

Reasons that a user might want to customize the models:

  • Using a different Granite autocomplete model. Right now, there are multiple reasonable options:

    granite-3.3-8b-base: this is the best Granite model for autocomplete. If someone has a fast GPU, they probably want to use this.
    granite-3.3-2b-base: worse, but faster. Best model for mid-end Macs
    granite-3.3-8b-instruct: Quality is similar to 2b-base, speed similar to 8b-base. If you have a GPU with limited memory and only have room for one Granite, this can make sense.

    In the future, it's also possible there will be multiple reasonable choices for chat models.

  • Using Ollama on a different port

  • Using hosted models. If you have access to an instance of VLLM running granite3.3:8b, use that instead of Ollama

  • Using old models - I don't care about this one. The Granite models have a track record for improving over time, and I don't want users torturing themselves trying to figure out whether granite3.2:8b is better than granite3.3:2b - because model outputs are inherently random (even at temperature 0), you just can't tell based on a small number of prompts. If someone really wants to investigate, they can always configure the models themselves. (See below)

  • Using third party models. Not our emphasis, but users (or at least Granite.Code developers...) will want to compare.

Proposed way it looks

A basic principle should be that selecting models and customizing models feels like an extension of the upstream UI rather than something alien to it.

To change your autocomplete model, you edit ~/.granite-code/models/autocomplete.yaml to change

  name: Granite.Code chat model
  version: 1.0.0
  schema: v1
  models:
-    - uses: granite.code/autocomplete@default
+    - uses: granite.code/granite-3.3:8b-base

To use a different Ollama port, you edit ~/.granite-code/models/{autocomplete,chat,embed}.yaml and add:

  name: Granite.Code chat model
  version: 1.0.0
  schema: v1
  models:
    - uses: granite.code/autocomplete@default
    - override:
       apiBase: ollama.local:11434

(OR we add a setting for this, OR we simply honor the OLLAMA_HOST environment variable - but this would be the general mechanism for overrides)

To use a hosted model, you replace ~/.granite-code/models/{autocomplete,chat,embed}.yaml with your own content.

To stop using a hosted model, you delete those files, and they will be recreated with the default content.

Notes:

  • This requires a pretty simple code change to provide our own RegistryClient when unrolling the yaml file
  • An alternative would be to uses: ./default-models/granite-autocomplete.yaml which avoids the code change and lets people actually open that file and see what is in there. Might be better.
  • When you are using a hosted model, you want to completely replace not override: to repoint, otherwise changes to the default model will cause user's configurations to break.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions