Skip to content

Add Support for Float Encoding in OpenAI-Compatible Embeddings #7199

@nszceta

Description

@nszceta

What specific problem does this solve?

This change solves a compatibility issue for all users who rely on OpenAI-compatible embedding providers that return embeddings as raw float arrays instead of base64-encoded strings.

Who is affected?
Users configuring custom or third-party embedding models via the "OpenAI-Compatible" provider option — particularly those using models hosted on platforms like Deepinfra with certain models, or other open-source inference servers that do not support or properly handle the encoding_format="base64" parameter.

When does this happen?
When a user configures an OpenAI-compatible endpoint that returns embeddings in float array format (the default for many providers), the system currently fails to parse the response correctly. This occurs during:

  • Codebase indexing
  • Embedding validation
  • Any feature relying on semantic search or context retrieval

Current vs Expected Behavior

Current: The system assumes all OpenAI-compatible APIs return base64-encoded embeddings and attempts to decode them. If the provider returns a float array instead, the decoding step fails silently or causes parsing errors, leading to empty or malformed embeddings. This results in failed indexing, poor search quality, or complete breakdown of code understanding features.

Expected: The system should support both base64 and float-encoded responses based on the provider's capabilities. Users should be able to explicitly enable float encoding when their endpoint returns raw float arrays.

Impact

Without this fix:

Users cannot successfully index their codebase when using float-returning models.

They encounter confusing errors or silent failures with no clear indication of the root cause.
Productivity is blocked because core AI features (e.g., code search, context-aware suggestions) stop working.

Debugging is difficult due to lack of proper logging around encoding assumptions.

By adding the useFloatEncoding option and proper handling in the embedder pipeline, users can now correctly integrate a wider range of models without workarounds, reducing setup time and increasing reliability of the code indexing system

Additional context (optional)

Implemented for my own needs in https://github.com/nszceta/Roo-Code/tree/deepinfra-embedding-fixes

Roo Code Task Links (Optional)

No response

Request checklist

  • I've searched existing Issues and Discussions for duplicates
  • This describes a specific problem with clear impact and context

Interested in implementing this?

  • Yes, I'd like to help implement this feature

Implementation requirements

  • I understand this needs approval before implementation begins

How should this be solved? (REQUIRED if contributing, optional otherwise)

What exactly will change?

  • A new boolean configuration option codebaseIndexOpenAiCompatibleUseFloatEncoding is added to enable float encoding for OpenAI-compatible embedding providers.
  • When enabled, the OpenAICompatibleEmbedder will request embeddings with encoding_format: "float" instead of "base64" and skip the base64 decoding step.
  • The system now properly handles float array responses directly from the API without attempting to decode them as base64.
  • Comprehensive logging via a dedicated VS Code output channel (e.g., "OpenAI Compatible Embedder") has been added to aid debugging.

How will users interact with it?

  • Users will see a new checkbox labeled "Use Float Encoding" in the codebase index settings UI when using an OpenAI-compatible embedder.
  • The checkbox includes a tooltip explaining: "Enable this option if your embedding provider returns float arrays instead of base64 encoded strings. This is required for providers that don't support base64 encoding."
  • Users can toggle this setting based on their embedding provider’s capabilities.

What will the new behaviour look like?

  • If float encoding is enabled, the system sends encoding_format="float" in the embedding request and expects the response to contain number arrays directly (e.g., [0.1, 0.2, 0.3]). These are used as-is without decoding.
  • If disabled (default), the system continues using encoding_format="base64" and decodes the base64 string into a float array.
  • Misformatted responses (e.g., string when float array is expected) trigger warnings in the output channel and return empty embeddings as fallback.
  • All key operations (construction, validation, embedding creation) log structured debug information to the output channel for transparency and troubleshooting.

This solution ensures compatibility with a wider range of OpenAI-compatible embedding services while maintaining backward compatibility and improving observability through detailed logging.

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Given the user is configuring an OpenAI-compatible embedding provider that returns embeddings as float arrays
When the user enables the "Use Float Encoding" option in the codebase index settings and saves the configuration
Then the system sends encoding_format: "float" in the embedding request payload
And the received embeddings are processed directly as number arrays without attempting base64 decoding
And the output channel logs confirm the use of float encoding and successful response handling
But the system does not attempt to decode the embedding as base64, avoiding parsing errors

Given the user has enabled float encoding
When an embedding response contains a non-array value (e.g., string) despite the float setting
Then the system logs a warning about unexpected data type
And returns an empty embedding array as fallback
But does not crash or propagate a parsing exception

Given the user disables or leaves "Use Float Encoding" off
When making embedding requests
Then the system continues to use encoding_format: "base64"
And correctly decodes base64-encoded responses into float arrays
But does not send encoding_format: "float" to the API

Given the user opens the codebase index settings popover
When viewing the OpenAI-compatible configuration section
Then the "Use Float Encoding" checkbox is visible with a tooltip explaining its purpose
And the setting persists after saving and reloading the UI
But the default value is false to maintain backward compatibility

Technical considerations (REQUIRED if contributing, optional otherwise)

The implementation of the codebaseIndexOpenAiCompatibleUseFloatEncoding feature introduces several technical considerations for planning and integration:

Implementation Approach & Architecture Changes

  • The feature extends the configuration schema with a new optional boolean flag, codebaseIndexOpenAiCompatibleUseFloatEncoding, which propagates from the UI through the state management layer to the OpenAICompatibleEmbedder class.
  • The embedder now conditionally sets the encoding_format parameter in API requests: "float" if enabled, "base64" otherwise. This avoids unnecessary decoding when floats are already provided.
  • A shared logging mechanism using vscode.OutputChannel was introduced across all embedder implementations (OpenAI, Ollama, Gemini, Mistral, OpenAI-Compatible) to standardize debug, warning, and error output.
  • The CodeIndexServiceFactory now passes an optional outputChannel to embedder constructors, enabling consistent logging without relying on global console output.

Performance Implications

  • Positive: When float encoding is used, the system skips base64 decoding, reducing CPU overhead and improving embedding ingestion speed—especially beneficial for large codebases.
  • Minimal Overhead: The conditional logic for encoding/format handling adds negligible runtime cost.
  • Memory: Float arrays may use slightly more memory than base64 strings during transmission, but this is offset by eliminating decoding buffers.

Compatibility Concerns

  • Backward Compatibility: The new flag defaults to false, preserving existing behavior for providers expecting base64.
  • Provider Compatibility: Required for embedding services that only return float arrays (e.g., certain local or custom models). Enabling it for base64-only providers will result in errors.
  • Type Handling: The system now safely handles mixed responses (e.g., float array vs. string) with fallbacks and warnings logged to the output channel.

Systems Affected

  • Frontend (Webview UI): New toggle added to settings UI with tooltips and debug logging.
  • State Management: Global state (codebaseIndexConfig) now includes the new flag.
  • Embedding Pipeline: OpenAICompatibleEmbedder and related services now conditionally process embeddings based on encoding type.
  • Logging Infrastructure: Unified logging via OutputChannel replaces raw console.warn/error calls for better user diagnostics.

Potential Blockers

  • Misconfiguration Risk: Users may enable float encoding for providers that don’t support it, leading to parsing failures. Clear documentation and validation messages are essential.
  • Testing Gaps: Requires real-world testing with diverse OpenAI-compatible endpoints to ensure robustness across different response formats.
  • Telemetry: Currently, no telemetry captures encoding-related failures. Consider adding events to monitor adoption and error patterns.

Overall, the change is low-risk, enhances flexibility, and aligns with the goal of supporting a wider range of embedding providers.

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Trade-offs and Risks

What Could Go Wrong?

  • Incorrect Configuration: If useFloatEncoding is enabled for a provider that actually returns base64-encoded strings, the system will treat the string as an array and fail to decode it properly, resulting in empty or invalid embeddings.
  • Provider Inconsistencies: Some OpenAI-compatible APIs may claim to support "float" encoding but still return base64 (or vice versa), leading to parsing errors.
  • Performance Overhead: While minimal, logging every constructor and validation step increases output volume, which could affect performance in environments with high initialization frequency.

Alternative Approaches Considered

  1. Auto-detection of Encoding Format:

    • Instead of a boolean flag, we could inspect the first embedding response and automatically determine whether it's a string (base64) or array (float).
    • Why not chosen: This adds complexity and potential race conditions during initialization. It also makes debugging harder since behavior becomes implicit rather than explicit.
  2. Runtime Fallback Logic:

    • Attempt float decoding first, fall back to base64 if it fails.
    • Why not chosen: Could mask misconfigurations and lead to silent degradation instead of clear errors. We prefer fail-fast behavior with explicit settings.
  3. Separate Embedder Class for Float Providers:

    • Create a new OpenAICompatibleFloatEmbedder class.
    • Why not chosen: Unnecessary code duplication. The difference is minor (just encoding format), so a configuration flag suffices.

Why We Chose This Approach

  • Simplicity: A single boolean flag cleanly captures the required behavior without increasing API surface.
  • Predictability: Users explicitly declare their provider’s capabilities, reducing surprises.
  • Debuggability: Extensive logging ensures issues can be diagnosed easily via the output channel.

Potential Negative Impacts

Area Impact Mitigation
Performance Slight increase in log volume Logs are debug-level and can be ignored in production; no runtime overhead beyond string serialization.
UX New setting may confuse users unfamiliar with embedding formats Added descriptive tooltip: "Enable this option if your embedding provider returns float arrays instead of base64 encoded strings."
Bundle Size Minimal increase due to additional logging and conditionals Negligible impact (<1KB).

Breaking Changes or Migration Concerns

  • No breaking changes. The new setting defaults to false, preserving existing behavior.
  • Existing configurations are unaffected.
  • No migration required — users opting into float encoding must explicitly enable it.

Edge Cases Handled

Edge Case Handling
Provider returns string when useFloatEncoding=true Log warning, return empty embedding array to prevent crash .
Mixed responses (some floats, some strings) in batch Process valid entries, log warnings for invalid ones, avoid failing entire batch .
encoding_format="float" not supported by API Let upstream API reject request; validation will surface the error during setup .
Malformed float array (e.g., NaN, null) Rely on OpenAI SDK or fetch layer to handle; logged if detected before return.
High-cardinality models with large float arrays No special handling — arrays are passed through directly; memory usage depends on model dimension.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestIssue - Needs ScopingValid, but needs effort estimate or design input before work can start.proposal

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions