Skip to content

Conversation

@slippersss
Copy link
Contributor

@slippersss slippersss commented Nov 1, 2025

What this PR does / why we need it?

This PR aims to fix memory ordering problem in share memory by patching message queue with an additional lock. The key point is to use the writer lock to enforce memory fence before the ready flag metadata_buffer[0] = 1 is set.

This is a temporary solution, and its impact on performance is currently being tested. Looking forward to more discussions and advices, thank you!

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

by ci

Signed-off-by: Zetong Li <slippersss@126.com>
@github-actions
Copy link

github-actions bot commented Nov 1, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a memory ordering problem in shared memory by introducing a lock. My review found a critical issue in the implementation where the lock is not used correctly, failing to prevent the race condition it was intended to fix. I've provided a code suggestion to correct the lock usage. Additionally, I've pointed out a minor issue with an unused import for general code hygiene.

Comment on lines +143 to +159
with self.writer_lock:
# mark the block as not written
metadata_buffer[0] = 0
# let caller write to the buffer
with self.buffer.get_data(self.current_idx) as buf:
yield buf

# caller has written to the buffer
# NOTE: order is important here
# first set the read flags to 0
# then set the written flag to 1
# otherwise, the readers may think they already read the block
for i in range(1, self.buffer.n_reader + 1):
# set read flag to 0, meaning it is not read yet
metadata_buffer[i] = 0
# mark the block as written
metadata_buffer[0] = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical race condition here. The writer_lock is released before the written_flag (metadata_buffer[0]) is set to 1. A lock's release operation acts as a memory barrier, ensuring that all memory writes within the critical section are visible to other processes. By setting the written_flag outside the lock, you lose this guarantee. A reader process might see the written_flag as 1 before the data in the buffer is actually visible, leading to it reading stale or corrupted data. To fix this, the line metadata_buffer[0] = 1 must be moved inside the with self.writer_lock: block to ensure all writes are correctly ordered and synchronized.

Suggested change
with self.writer_lock:
# mark the block as not written
metadata_buffer[0] = 0
# let caller write to the buffer
with self.buffer.get_data(self.current_idx) as buf:
yield buf
# caller has written to the buffer
# NOTE: order is important here
# first set the read flags to 0
# then set the written flag to 1
# otherwise, the readers may think they already read the block
for i in range(1, self.buffer.n_reader + 1):
# set read flag to 0, meaning it is not read yet
metadata_buffer[i] = 0
# mark the block as written
metadata_buffer[0] = 1
with self.writer_lock:
# mark the block as not written
metadata_buffer[0] = 0
# let caller write to the buffer
with self.buffer.get_data(self.current_idx) as buf:
yield buf
# caller has written to the buffer
# NOTE: order is important here
# first set the read flags to 0
# then set the written flag to 1
# otherwise, the readers may think they already read the block
for i in range(1, self.buffer.n_reader + 1):
# set read flag to 0, meaning it is not read yet
metadata_buffer[i] = 0
# mark the block as written
metadata_buffer[0] = 1

Comment on lines +18 to +19
from vllm.distributed.device_communicators.shm_broadcast import (
MessageQueue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The MessageQueue is imported but never used within the run_engine_core function. This unused import should be removed to improve code clarity and maintainability.

@slippersss slippersss changed the title [Bugfix][SHM] Fix memory ordering problem in share memory [Bugfix][SHM] Fix memory ordering problem in shm Nov 4, 2025
@slippersss slippersss closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant