Feature Proposal: Allow Custom Embedding Model Configuration for Codebase Indexing

### What problem does this proposed feature solve?


Currently, Roo Code's codebase indexing feature limits users to predefined embedding models and endpoints (primarily OpenAI's default `text-embedding-3-small` or potentially a default Ollama model). This prevents users from:

1.  **Leveraging Local Models:** Using other powerful locally hosted embedding models different from the hardcoded via Ollama.
2.  **Using Specialized Models:** Employing models specifically fine-tuned for code embeddings (e.g., `jina-embeddings-v2-base-code`) or other niche models that might offer better performance for their specific codebase.
3.  **Connecting via Custom Endpoints:** Using models hosted behind corporate proxies, specific cloud provider endpoints (like Azure OpenAI), or custom API gateways.
4.  **Utilizing Models with Non-Standard Dimensions:** Using embedding models whose vector dimensions are not part of Roo Code's known model profiles.

This lack of flexibility restricts user choice, prevents optimization for specific needs or infrastructure, and hinders alignment with the project goal of supporting a wide variety of AI providers.

### Describe the proposed solution in detail

This feature introduces new configuration options within the Roo Code settings UI (and corresponding `settings.json` entries) under the "Code Indexing" section:

1.  **Provider Selection:** User selects between "openai" and "ollama" (this exists but is now more central to the custom config).
2.  **Base URL (`roo-cline.codebaseIndexEmbedderBaseUrl`):** A text input field allows specifying a custom API endpoint URL.
    *   For **Ollama**: Users can point to their local or remote Ollama instance (e.g., `http://localhost:11434`).
    *   For **OpenAI**: Users can point to a proxy, an Azure OpenAI endpoint, or other OpenAI API-compatible endpoints.
3.  **Model ID (`roo-cline.codebaseIndexEmbedderModelId`):** A text input field allows the user to specify *any* model ID compatible with their chosen provider and endpoint (e.g., `jina-embeddings-v2-base-code:latest`, `text-embedding-3-large`, `my-custom-finetune`). This replaces the previous potentially limited selection method. If left empty, a reasonable default for the provider should be used (e.g., `text-embedding-3-small` for OpenAI).
4.  **Custom Dimension (`roo-cline.codebaseIndexEmbedderDimension`):** An optional numeric input field where users can specify the embedding dimension (vector size) required by their chosen custom model *only if* it's not automatically recognized by Roo Code's internal profiles (e.g., `1536`, `768`).


**Backend Implementation:**

*   The new settings (`codebaseIndexEmbedderBaseUrl`, `codebaseIndexEmbedderModelId`, `codebaseIndexEmbedderDimension`) are read from the configuration.
*   The appropriate `baseUrl` and `modelId` are passed to the selected embedder service (`OpenAiEmbedder` or `CodeIndexOllamaEmbedder`).
*   If a custom `dimension` is provided *and* the provider is OpenAI, it's passed to the `OpenAiEmbedder` to use the `dimensions` parameter in the OpenAI API call.
*   The logic for determining the required vector size for the Qdrant collection (`service-factory.ts`) is updated:
    *   It first attempts to determine the dimension using the existing `getModelDimension` lookup based on the provider and `modelId`.
    *   If the dimension is *not found* via lookup, it *then* checks if a valid positive integer was provided in the `codebaseIndexEmbedderDimension` setting.
    *   If a valid dimension is found via either method, it's used for the Qdrant client.
    *   If no dimension can be determined, an error is raised, guiding the user to either use a known model or provide the dimension manually.
*   The `CodeIndexOllamaEmbedder` implementation was refined to correctly interact with the standard Ollama `/api/embed` endpoint (sending one prompt at a time and parsing the singular `embedding` field in the response).

### Technical considerations or implementation details (optional)

*   Key files modified include various type definitions (`*.d.ts`, `types.ts`, `schemas/index.ts`, `interfaces/config.ts`), configuration handling (`config-manager.ts`, `package.json`), embedder implementations (`embedders/ollama.ts`, `embedders/openai.ts`), service creation (`service-factory.ts`), and the settings UI (`CodeIndexSettings.tsx`, `settings.json`).
*   The dimension fallback logic in `service-factory.ts` ensures graceful handling of unknown models when the user provides the necessary dimension.
*   New VS Code configuration settings are added in `package.json` to expose these options directly in user/workspace `settings.json`.
*   Typescript types and Zod schemas were updated to include the optional `codebaseIndexEmbedderDimension: number | null`.

### Describe alternatives considered (if any)

_No response_

### Additional Context & Mockups

*   **Testing Context:** This feature was successfully tested using a local Ollama instance (`http://localhost:11434`) with the model `jina-embeddings-v2-base-code:latest` (which requires specifying dimension `768`). The custom URL, model ID, and dimension settings were correctly used for indexing.
*   **Development Note:** The initial implementation was assisted by Cursor (Claude 4) and subsequently reviewed, refactored, and tested manually. As I am not deeply experienced with TypeScript/React within this specific codebase, feedback on implementation details, potential improvements, or adherence to project best practices during the PR review would be highly appreciated.

### Proposal Checklist

- [x] I have searched existing Issues and Discussions to ensure this proposal is not a duplicate.
- [x] This proposal is for a specific, actionable change intended for implementation (not a general idea).
- [x] I understand that this proposal requires review and approval before any development work begins.

### Are you interested in implementing this feature if approved?

- [x] Yes, I would like to contribute to implementing this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Proposal: Allow Custom Embedding Model Configuration for Codebase Indexing #3959

What problem does this proposed feature solve?

Describe the proposed solution in detail

Technical considerations or implementation details (optional)

Describe alternatives considered (if any)

Additional Context & Mockups

Proposal Checklist

Are you interested in implementing this feature if approved?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Proposal: Allow Custom Embedding Model Configuration for Codebase Indexing #3959

Description

What problem does this proposed feature solve?

Describe the proposed solution in detail

Technical considerations or implementation details (optional)

Describe alternatives considered (if any)

Additional Context & Mockups

Proposal Checklist

Are you interested in implementing this feature if approved?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions