Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama Deploy Race Condition #481

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Llama Deploy Race Condition #481

wants to merge 1 commit into from

Conversation

eyao27
Copy link

@eyao27 eyao27 commented Mar 7, 2025

Problem

At our company, we use LlamaIndex Workflows for chat bot services. We are encountering a very subtle race condition bug where certain events in the Workflow's event stream are missing.

Root Cause

After some debugging, we've narrowed the root cause down to the set_workflow_state function of the Workflow Service in Llama Deploy. This function is run when the workflow has completed. It's purpose is to update the state of the workflow on the control plane. However, a subtle bug can occur when the workflow is finished, yet there are still streaming messages that the control plane hasn't processed.

Details

In the function set_workflow_state, the variable session_state contains numerous keys, one for the session ID, but also many for the tasks, results and streams:

For example, there could be keys for:

  • 01d59890-f639-448b-984b-b97e113d0d41 (session ID)
  • 504b8a6d-b846-4bbb-b684-a2b2051fcd81 (task ID)
  • stream_504b8a6d-b846-4bbb-b684-a2b2051fcd81
  • result_504b8a6d-b846-4bbb-b684-a2b2051fcd81

The current code works by:

  1. Getting this dictionary with many keys
    • session_state = await self.get_session_state(current_state.session_id)
  2. Changing the value for the key 01d59890-f639-448b-984b-b97e113d0d41 (i.e., current_state.session_id).
    • session_state[current_state.session_id] = workflow_state.model_dump_json()
  3. Setting the entire dictionary.
    • await self.update_session_state(current_state.session_id, session_state)

This introduces a possible race condition where between steps 1 and 3, the value for other keys, namely the stream_504b8a6d-b846-4bbb-b684-a2b2051fcd81 can change. In particular, additional messages may have been received, but are not processed because they are replaced in step 3.

Solution

The modified code works by only updating the value for the key 01d59890-f639-448b-984b-b97e113d0d41 (the session ID), without changing the other values for the other keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant