Add e2e tests for embedding raw flag #16923

SamMalayek · 2025-11-01T20:59:59Z

🧩 Summary

This PR adds a CI workflow for end-to-end embedding tests.
It marks the first phase of an effort to move an abstraction of the existing examples/llama-embedding logic behind llama-server, so the server can use llama.cpp’s own embedding implementation instead of external (OpenAI) APIs.

🎯 Motivation & Future

llama-server currently supports OpenAI-compatible /embedding requests, but those are not backed by native llama.cpp logic.
This workflow establishes a reproducible test foundation before refactoring the embedding code so that:

The server can generate embeddings locally.
--parallel N can support multiple concurrent embedding requests.
The standalone CLI will remain for lightweight workflows, while the server will use the same shared embedding path for persistent deployments.

⚙️ CI Implementation

Adds a GitHub Actions job to run embedding E2E tests with cached GGUF models (TinyLlama).
Verifies embedding output dimensions and deterministic behavior.
Uses lightweight models for fast CI runs (with an optional large model test).

🧱 Embedding CPP Logic Flow Update

A small cleanup in print_raw_embeddings() improves readability, logic flow, and isolation.
Although minor, this change is modular alongside the CI workflow changes, touching a vertical slice of the embedding flow without altering evaluation, model logic, or any interface. Note that expecting purely small horizontal modularity ossifies software (makes it brittle).

🚀 Next Steps

Extend CI test coverage for all embedding endpoints/flags.
Abstract core embedding code from examples into a shared utility (e.g. common/embedding_utils.cpp).
Integrate that abstraction into llama-server for local "/embedding" requests (while maintaining CLI endpoints and backwards compatibility).
a. Extend CI coverage for concurrent (--parallel) embedding tests.

(could actually become more than three steps)

Note:
This PR includes a workflow (run-e2e-embedding.yml) for local/fork testing.
It runs only on feature/* branches and is designed to verify embedding e2e tests in forked CI.
Maintainers may integrate or adapt this into the upstream CI configuration if desired.

ggerganov · 2025-11-02T09:18:26Z

Too much slop.

SamMalayek · 2025-11-02T15:04:51Z

Too much slop.

This is a pristine PR with a pristine plan. However, I'll interpret your comment as "too much scope creep" after I just landed a PR in a relatively sloppy part of the codebase (examples -- totally fair btw), and I'll push another PR that:

Removes the embedding.cpp improvement.
Refactors the embedding tests into a focused, deterministic CLI suite with broader input coverage and reproducible numeric validation.
Removes e2e test benchmarking (which would have been quite useful for this much-needed refactor of the embedding cli code, but I can just run these locally).

... #16940

Furthermore, I'm opening a discussion RFC as well: #16957. This is a plan that is simply needed for both our Llama.cpp repos (including my cloned repo) because:

The embedding endpoint for llama-server should be native, rather than use OpenAI's APIs.
I actually need this for my project's embedding pipeline.

Add e2e tests for embedding raw flag

109fad0

SamMalayek requested review from CISC and ggerganov as code owners November 1, 2025 20:59

DajanaV mentioned this pull request Nov 1, 2025

UPSTREAM PR #16923: Add e2e tests for embedding raw flag auroralabs-loci/llama.cpp#36

Closed

github-actions bot added examples python python script changes devops improvements to build systems and github actions labels Nov 1, 2025

SamMalayek mentioned this pull request Nov 1, 2025

embedding: add raw option for --embd-output-format #16541

Merged

ggerganov closed this Nov 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add e2e tests for embedding raw flag #16923

Add e2e tests for embedding raw flag #16923

Uh oh!

SamMalayek commented Nov 1, 2025 •

edited

Loading

Uh oh!

ggerganov commented Nov 2, 2025

Uh oh!

SamMalayek commented Nov 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add e2e tests for embedding raw flag #16923

Add e2e tests for embedding raw flag #16923

Uh oh!

Conversation

SamMalayek commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧩 Summary

🎯 Motivation & Future

⚙️ CI Implementation

🧱 Embedding CPP Logic Flow Update

🚀 Next Steps

Uh oh!

ggerganov commented Nov 2, 2025

Uh oh!

SamMalayek commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SamMalayek commented Nov 1, 2025 •

edited

Loading

SamMalayek commented Nov 2, 2025 •

edited

Loading