Skip to content

Conversation

@mag1c-h
Copy link
Contributor

@mag1c-h mag1c-h commented Oct 29, 2025

Problem

When KVCache is read concurrently, the DRAM buffer is freed and reused before the asynchronous H2D copy finishes. As a result, the device reads to the stale buffer and inference accuracy is corrupted.

Fix

Keep the buffer alive until the device stream signals completion.

Verification

Added read-after-write checks to the E2E preload script to guarantee data consistency.

@ygwpz ygwpz merged commit 9708eee into ModelEngine-Group:develop Oct 29, 2025
3 checks passed
@mag1c-h mag1c-h deleted the dev_fix_device_buffer branch October 29, 2025 12:07
flesher0813 pushed a commit that referenced this pull request Oct 30, 2025
…322)

* linear buffer for device

* check data consistency after embedding
you-seesee-you pushed a commit to you-seesee-you/unified-cache-management that referenced this pull request Nov 5, 2025
…odelEngine-Group#322)

* linear buffer for device

* check data consistency after embedding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants