-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Is your feature request related to a problem?
- Yes, it is related to a problem
Describe the feature you'd like
🌟Feature Description
Offload blocking embedding work from the async event loop so that EmbeddingService does not block other concurrent requests. The service should run CPU/GPU-bound model.encode() calls in a thread pool (e.g. via asyncio.to_thread()) while keeping the public API async.
🔍 Problem Statement
EmbeddingService exposes async methods (get_embedding, get_embeddings, summarize_user_profile, search_similar_profiles) but internally calls synchronous SentenceTransformer code:
self.model.encode(...)inget_embeddingandget_embeddingsis blocking. It runs on CPU/GPU and does not yield to the event loop.- While one request is generating embeddings, the entire process is blocked: other HTTP requests, agent tools, and background tasks stall until the encode finishes.
- The LLM call in
summarize_user_profilecorrectly usesawait self.llm.ainvoke(...)and is non-blocking; only the embedding step blocks.
This hurts latency and concurrency wherever the service is used (e.g. issue_processor.py, contributor_recommendation.py, user profiling).
🎯 Expected Outcome
- Event loop stays responsive during embedding generation: other async work (API handlers, other embeddings, LLM calls) can run while
model.encode()runs in a worker thread. - No change to the public API: callers keep using
await embedding_service.get_embedding(...)andawait embedding_service.get_embeddings(...). - Implementation approach: add a synchronous helper that performs
model.encode()and tensor-to-list conversion; call it fromget_embeddingandget_embeddingsviaasyncio.to_thread()(orloop.run_in_executor()with aThreadPoolExecutor). - Optional: run model lazy-load in a thread at first use to avoid blocking on first request.
📷 Screenshots and Design Ideas
Before: One long embedding request blocks the event loop → other requests wait.
After: Embedding runs in a thread pool → event loop continues handling other requests; embedding call still awaited by the original request.
No UI changes; this is a backend concurrency fix.
📋 Additional Context
- File to change:
backend/app/services/embedding_service/service.py - Consumers:
app/services/github/user/profiling.py,app/services/github/issue_processor.py,app/agents/devrel/github/tools/contributor_recommendation.py - Suggested steps:
- Add
import asyncio. - Add a sync helper method (e.g.
_encode(texts)) that callsself.model.encode(...)and returns list(s) of floats. - In
get_embedding: replace directmodel.encodewithawait asyncio.to_thread(self._encode, [text]), then return the single embedding list. - In
get_embeddings: replace directmodel.encodewithawait asyncio.to_thread(self._encode, texts)and return the list of lists.
- Add
- Verification: While one request is generating embeddings, trigger another (e.g. health check or simple async endpoint); the second should respond without waiting for the first.
Record
- I agree to follow this project's Code of Conduct
- I want to work on implementing this feature