Support `StaticEmbedding` model #790

kozistr · 2025-12-29T15:11:21Z

What does this PR do?

Close #731

Looks like, for StaticEmbedding models, only the 0_StaticEmbedding/ directory exists (containing both model weights and tokenizer). I've added fallback logic to load from there when root files are missing. (But, I believe that there might be a better way to handle this)

Additionally, I've opened the PR, adding the config.json file to the root directory, which is essential for working with TEI.

Log

./target/release/text-embeddings-router --model-id ../static-similarity-mrl-multilingual-v1 --pooling mean --port 8080 --dtype float32 --auto-truncate --max-batch-tokens 512
2025-12-29T14:37:17.280465Z  INFO text_embeddings_router: router/src/main.rs:205: Args { model_id: "../******-**********-***-***********l-v1", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 512, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-12-29T14:37:17.288338Z  WARN text_embeddings_router: router/src/lib.rs:143: tokenizer.json not found in root. Trying 0_StaticEmbedding/.
2025-12-29T14:37:17.398815Z  WARN text_embeddings_router: router/src/lib.rs:206: Could not find a Sentence Transformers config
2025-12-29T14:37:17.398846Z  INFO text_embeddings_router: router/src/lib.rs:231: Maximum number of tokens per request: 512
2025-12-29T14:37:17.400838Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 7 tokenization workers
2025-12-29T14:37:17.400957Z  INFO text_embeddings_router: router/src/lib.rs:281: Starting model backend
2025-12-29T14:37:17.410927Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:315: Starting StaticEmbedding model on Cpu
2025-12-29T14:37:19.293286Z  INFO text_embeddings_router: router/src/lib.rs:299: Warming up model
2025-12-29T14:37:19.342559Z  WARN text_embeddings_router: router/src/lib.rs:308: Backend does not support a batch size > 4
2025-12-29T14:37:19.342593Z  WARN text_embeddings_router: router/src/lib.rs:309: forcing `max_batch_requests=4`
2025-12-29T14:37:19.344004Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:8080
2025-12-29T14:37:19.344044Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready
2025-12-29T14:39:43.209327Z  INFO embed{total_time="1.153233ms" tokenization_time="303.529µs" queue_time="448.98µs" inference_time="284.735µs"}: text_embeddings_router::http::server: router/src/http/server.rs:733: Success

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@alvarobartt

This commit adapts text-embeddings-inference for NVIDIA Jetson Orin (SM87) and L4 GPU (SM89), and integrates valuable community PRs. Changes: 1. SM87/SM89 CUDA Support - Added compute capability 8.7 and 8.9 support - Modified Dockerfile-cuda-all for multi-arch builds - Updated compute_cap.rs for SM87/89 detection Files: Dockerfile-cuda-all, cuda-all-entrypoint.sh, compute_cap.rs 2. PR huggingface#730: Qwen3 Reranker Support - Added classification head for Qwen3 reranking - Implemented template formatting system for chat-based reranking Files: models/qwen3.rs, core/templates.rs, core/lib.rs 3. PR huggingface#787: Batch Notification Performance Optimization - Implemented AtomicUsize counter for batch processing - Reduced unnecessary notify_one() calls - Only last request in batch triggers thread notification Files: core/infer.rs, router/http/server.rs, router/grpc/server.rs 4. PR huggingface#753: GeLU Activation Consistency Fix - Changed Gelu from approximate (gelu) to exact (gelu_erf) - Added NewGelu variant for backward compatibility Files: layers/linear.rs 5. PR huggingface#790: StaticEmbedding Model Support - Added support for 0_StaticEmbedding/ directory structure - Implemented fallback loading for model weights and tokenizer - Default to Mean pooling for StaticEmbedding models Files: models/static_embedding.rs (new), lib.rs, download.rs, router/lib.rs 6. PR huggingface#746: DebertaV2 Sequence Classification Support - Complete DebertaV2 model implementation - Support for sequence classification tasks (e.g., Llama Prompt Guard) - CPU and CUDA device support Files: models/debertav2.rs (new), lib.rs, models/mod.rs All changes have been tested and compile successfully with: cargo check --all-targets Compilation verified with CUDA support: cargo install --path router -F candle-cuda Target Hardware: NVIDIA Jetson Orin AGX (SM87), L4 GPU (SM89) Date: January 5, 2026

kozistr added 3 commits December 29, 2025 23:40

feature: implement StaticEmbedding model

06f533f

update: codes

09edc12

add the test snapshots

44d631a

kozistr marked this pull request as ready for review December 30, 2025 11:55

default to mean pooling when not specified

479d7e4

alvarobartt added this to the v1.9.0 milestone Dec 30, 2025

kozistr added 2 commits January 2, 2026 19:14

Merge branch 'main' into feature/static-embedding

544ebf4

Merge branch 'main' into feature/static-embedding

6e402fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `StaticEmbedding` model #790

Support `StaticEmbedding` model #790

Uh oh!

kozistr commented Dec 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support StaticEmbedding model #790

Are you sure you want to change the base?

Support StaticEmbedding model #790

Uh oh!

Conversation

kozistr commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Log

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support `StaticEmbedding` model #790

Support `StaticEmbedding` model #790

kozistr commented Dec 29, 2025 •

edited

Loading