Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 340 Bytes

README.md

File metadata and controls

11 lines (9 loc) · 340 Bytes

CUDA torch functions for LLM

For study purpose

implemented attentions

  • Naive Attention
  • Attention with KV
  • Attention with non-contagious memory
  • Single Query Attention with non-contagious KV cache (PagedAttention with block size 1)
  • Multi Query Attention with non-contagious KV cache (for Speculative Decoding)
  • Rotary Embedding