Skip to content

FEATURE REQUEST:Fix EmbeddingService blocking the event loop #261

@vaishcodescape

Description

@vaishcodescape

Is your feature request related to a problem?

  • Yes, it is related to a problem

Describe the feature you'd like

🌟Feature Description

Offload blocking embedding work from the async event loop so that EmbeddingService does not block other concurrent requests. The service should run CPU/GPU-bound model.encode() calls in a thread pool (e.g. via asyncio.to_thread()) while keeping the public API async.

🔍 Problem Statement

EmbeddingService exposes async methods (get_embedding, get_embeddings, summarize_user_profile, search_similar_profiles) but internally calls synchronous SentenceTransformer code:

  • self.model.encode(...) in get_embedding and get_embeddings is blocking. It runs on CPU/GPU and does not yield to the event loop.
  • While one request is generating embeddings, the entire process is blocked: other HTTP requests, agent tools, and background tasks stall until the encode finishes.
  • The LLM call in summarize_user_profile correctly uses await self.llm.ainvoke(...) and is non-blocking; only the embedding step blocks.

This hurts latency and concurrency wherever the service is used (e.g. issue_processor.py, contributor_recommendation.py, user profiling).

🎯 Expected Outcome

  • Event loop stays responsive during embedding generation: other async work (API handlers, other embeddings, LLM calls) can run while model.encode() runs in a worker thread.
  • No change to the public API: callers keep using await embedding_service.get_embedding(...) and await embedding_service.get_embeddings(...).
  • Implementation approach: add a synchronous helper that performs model.encode() and tensor-to-list conversion; call it from get_embedding and get_embeddings via asyncio.to_thread() (or loop.run_in_executor() with a ThreadPoolExecutor).
  • Optional: run model lazy-load in a thread at first use to avoid blocking on first request.

📷 Screenshots and Design Ideas

Before: One long embedding request blocks the event loop → other requests wait.

After: Embedding runs in a thread pool → event loop continues handling other requests; embedding call still awaited by the original request.

No UI changes; this is a backend concurrency fix.

📋 Additional Context

  • File to change: backend/app/services/embedding_service/service.py
  • Consumers: app/services/github/user/profiling.py, app/services/github/issue_processor.py, app/agents/devrel/github/tools/contributor_recommendation.py
  • Suggested steps:
    1. Add import asyncio.
    2. Add a sync helper method (e.g. _encode(texts)) that calls self.model.encode(...) and returns list(s) of floats.
    3. In get_embedding: replace direct model.encode with await asyncio.to_thread(self._encode, [text]), then return the single embedding list.
    4. In get_embeddings: replace direct model.encode with await asyncio.to_thread(self._encode, texts) and return the list of lists.
  • Verification: While one request is generating embeddings, trigger another (e.g. health check or simple async endpoint); the second should respond without waiting for the first.

Record

  • I agree to follow this project's Code of Conduct
  • I want to work on implementing this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions