Skip to content

Conversation

@ZhangLirong-amd
Copy link
Contributor

@ZhangLirong-amd ZhangLirong-amd commented Jan 28, 2026

Motivation

Deepseek new paper Engram, there is a demo Engram module implement:

https://github.com/deepseek-ai/Engram/blob/main/engram_demo_v1.py

We refer it and create a simple model named Engram model, it consists of Engram + Attention + FFN, just a simple model.

Engram consists of NgramHashMapping + MultiHeadEmbedding + ShortConv,
and we want to overlap NgramHashMapping computation on CPU in model_runner.

1 model_runner  -> sample token_id  to CPU -> prefetch_engram_hash  -> 
save to a buffer and a new GPU stream copy buffer to GPU

2 Engram -> load the hash buffer -> MultiHeadEmbedding -> ShortConv

Technical Details

Test Plan

Test Result

Submission Checklist

@ZhangLirong-amd ZhangLirong-amd changed the title [POC][Deepseek] Demo on Engram module, model_runner hash compute overlap [POC][Deepseek] Engram support, model_runner hash compute overlap Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants