⚡️ Speed up method `LoggingWorker._worker_loop` by 5% #425

codeflash-ai · 2025-11-11T00:47:51Z

📄 5% (0.05x) speedup for `LoggingWorker._worker_loop` in `litellm/litellm_core_utils/logging_worker.py`

⏱️ Runtime : 1.10 seconds → 1.05 seconds (best of 15 runs)

📝 Explanation and details

The optimized code achieves a 5% runtime improvement and 7.1% throughput improvement through two key micro-optimizations in the high-frequency async worker loop:

Key Optimizations

1. Local Variable Caching
The most impactful change is caching frequently accessed methods and attributes as local variables before entering tight loops:

queue_get = _queue.get and queue_task_done = _queue.task_done eliminate repeated attribute lookups
wait_for = asyncio.wait_for and timeout = self.timeout cache commonly used values
In clear_queue(), similar caching for queue_get_nowait, loop_time(), and constants

This optimization is particularly effective in Python because attribute access (self.attribute or module.function) involves dictionary lookups that add overhead in tight loops. The line profiler shows the main while True loop executing 1,669 times, making these micro-optimizations compound significantly.

2. Conditional Logging Guards
Added verbose_logger.isEnabledFor() checks before expensive logging operations:

Prevents costly string formatting (f"LoggingWorker error: {e}") when logging level wouldn't emit the message
Particularly beneficial during exception handling and shutdown scenarios

Performance Impact

The optimization targets the hottest code paths shown in the profiler:

The await queue_get() line (12.6% of total time) benefits from cached method reference
The queue_task_done() call (4.1% of total time) similarly optimized
Exception logging overhead reduced from 91.4% to near-zero when logging is disabled

Workload Suitability

This LoggingWorker appears designed for high-throughput background processing (mentioned "+200 RPS improvement" in comments), making these micro-optimizations valuable for:

High-frequency logging scenarios where the worker processes many tasks rapidly
Production environments where logging levels are typically higher (INFO/WARNING), making conditional guards effective
Async applications where every microsecond in the event loop matters for overall throughput

The optimizations preserve all behavioral guarantees while delivering measurable performance gains in the target use case of processing many logging tasks asynchronously.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 35 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import asyncio # used to run async functions
from types import SimpleNamespace
from typing import Optional

import pytest # used for our unit tests
from litellm.litellm_core_utils.logging_worker import LoggingWorker

---- Helper classes and functions for testing ----

class DummyContext:
"""A dummy context that mimics the interface used in LoggingWorker._worker_loop."""
def init(self):
self.ran = False
self.last_args = None
self.last_kwargs = None

async def run(self, func, coroutine):
    # Mark that run was called
    self.ran = True
    # Await the coroutine (which may be a task or coroutine)
    result = await func(coroutine)
    return result

class DummyContextWithException(DummyContext):
"""Context that raises an exception when run is called."""
async def run(self, func, coroutine):
raise RuntimeError("DummyContextWithException error")

async def dummy_coro(val=None):
"""A simple dummy coroutine for testing."""
return val

async def dummy_coro_exception():
"""A coroutine that raises an exception."""
raise ValueError("Intentional error")

---- Basic Test Cases ----

@pytest.mark.asyncio
async def test_worker_loop_returns_none_when_queue_is_none():
"""
Test that _worker_loop returns immediately (None) if the queue is None.
"""
worker = LoggingWorker()
# _queue is None by default
result = await worker._worker_loop()

@pytest.mark.asyncio

#------------------------------------------------
import asyncio # used to run async functions
from typing import Optional
from unittest.mock import MagicMock

import pytest # used for our unit tests
from litellm._logging import verbose_logger
from litellm.litellm_core_utils.logging_worker import LoggingWorker

--- Helper classes and functions for testing ---

class DummyContext:
"""A dummy context object with a .run method to simulate context.run."""
def init(self):
self.ran = False
self.args = None
self.kwargs = None

async def run(self, func, coro, *args, **kwargs):
    # Simulate running a coroutine in a context
    self.ran = True
    self.args = args
    self.kwargs = kwargs
    # func should be asyncio.create_task, but we just await the coroutine
    # to keep things simple and deterministic
    return await coro

def make_logging_task(coro, context=None):
"""Helper to construct a LoggingTask dict for LoggingWorker."""
if context is None:
context = DummyContext()
return {"coroutine": coro, "context": context}

--- Basic Test Cases ---

@pytest.mark.asyncio
async def test_worker_loop_returns_none_when_queue_is_none():
"""
Test that _worker_loop returns immediately if the queue is None.
"""
worker = LoggingWorker()
# _queue is None by default
result = await worker._worker_loop()

@pytest.mark.asyncio

async def test_worker_loop_throughput_high_load():
"""
Throughput test: high load (200 tasks).
"""
worker = LoggingWorker()
worker._queue = asyncio.Queue()
num_tasks = 200
called = [False] * num_tasks
contexts = [DummyContext() for _ in range(num_tasks)]

async def make_coro(i):
    called[i] = True
    return i

for i in range(num_tasks):
    await worker._queue.put(make_logging_task(make_coro(i), contexts[i]))

worker_task = asyncio.create_task(worker._worker_loop())
await asyncio.sleep(0.1)
worker_task.cancel()
try:
    await worker_task
except asyncio.CancelledError:
    pass

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-LoggingWorker._worker_loop-mhtuqr4z and push.

The optimized code achieves a **5% runtime improvement** and **7.1% throughput improvement** through two key micro-optimizations in the high-frequency async worker loop: ## Key Optimizations **1. Local Variable Caching** The most impactful change is caching frequently accessed methods and attributes as local variables before entering tight loops: - `queue_get = _queue.get` and `queue_task_done = _queue.task_done` eliminate repeated attribute lookups - `wait_for = asyncio.wait_for` and `timeout = self.timeout` cache commonly used values - In `clear_queue()`, similar caching for `queue_get_nowait`, `loop_time()`, and constants This optimization is particularly effective in Python because attribute access (`self.attribute` or `module.function`) involves dictionary lookups that add overhead in tight loops. The line profiler shows the main `while True` loop executing 1,669 times, making these micro-optimizations compound significantly. **2. Conditional Logging Guards** Added `verbose_logger.isEnabledFor()` checks before expensive logging operations: - Prevents costly string formatting (`f"LoggingWorker error: {e}"`) when logging level wouldn't emit the message - Particularly beneficial during exception handling and shutdown scenarios ## Performance Impact The optimization targets the hottest code paths shown in the profiler: - The `await queue_get()` line (12.6% of total time) benefits from cached method reference - The `queue_task_done()` call (4.1% of total time) similarly optimized - Exception logging overhead reduced from 91.4% to near-zero when logging is disabled ## Workload Suitability This LoggingWorker appears designed for high-throughput background processing (mentioned "+200 RPS improvement" in comments), making these micro-optimizations valuable for: - **High-frequency logging scenarios** where the worker processes many tasks rapidly - **Production environments** where logging levels are typically higher (INFO/WARNING), making conditional guards effective - **Async applications** where every microsecond in the event loop matters for overall throughput The optimizations preserve all behavioral guarantees while delivering measurable performance gains in the target use case of processing many logging tasks asynchronously.

codeflash-ai bot requested a review from mashraf-222 November 11, 2025 00:47

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `LoggingWorker._worker_loop` by 5% #425

⚡️ Speed up method `LoggingWorker._worker_loop` by 5% #425

Uh oh!

codeflash-ai bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method LoggingWorker._worker_loop by 5% #425

Are you sure you want to change the base?

⚡️ Speed up method LoggingWorker._worker_loop by 5% #425

Uh oh!

Conversation

codeflash-ai bot commented Nov 11, 2025

📄 5% (0.05x) speedup for LoggingWorker._worker_loop in litellm/litellm_core_utils/logging_worker.py

📝 Explanation and details

Key Optimizations

Performance Impact

Workload Suitability

---- Helper classes and functions for testing ----

---- Basic Test Cases ----

--- Helper classes and functions for testing ---

--- Basic Test Cases ---

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `LoggingWorker._worker_loop` by 5% #425

⚡️ Speed up method `LoggingWorker._worker_loop` by 5% #425

📄 5% (0.05x) speedup for `LoggingWorker._worker_loop` in `litellm/litellm_core_utils/logging_worker.py`