feat: add llama 3.1 style rope #401

yzh119 · 2024-07-27T06:58:47Z

Reference implementation: https://github.com/meta-llama/llama-models/blob/709a61fd810157f75fbb314e7287089eec06d9c3/models/llama3_1/api/model.py#L41

This PR also expose the BatchQKApplyRotaryInPlaceKernel to pytorch APIs, previous they are only used in TVM wrappers.

🤖 I have created a release *beep* *boop* --- ## [0.1.2](v0.1.1...v0.1.2) (2024-07-29) ### Bugfix * Fix the sampling kernel bug for cu118 ([#386](#386), [#387](#387)) ([0cd499](0cd4994), [dc3f18](dc3f184)) ### Features * add llama 3.1 style rope ([#401](#401)) ([4c89dec](4c89dec)) * non-inplace rope operators ([#405](#405)) ([74ffba1](74ffba1)) * sliding window attention ([#406](#406)) ([28cffd3](28cffd3)) * support non-contiguous (packed) input for prefill kernels ([#404](#404)) ([68c3719](68c3719)) ### Performance Improvements * slight optimization on merge states ([#313](#313)) ([701c813](701c813)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <expye@outlook.com>

chenzhuofu · 2024-08-25T19:15:57Z

Awesome!

chenzhuofu · 2024-08-25T20:17:56Z

Looks like llama-3.1-rope hasn't been incoporated into PosEncodingMode, so I think I may explicitly use BatchQKApplyLlama31Rotary and with PosEncodingMode::kNone in AttentionKernel. How do you think of it? @yzh119

yzh119 · 2024-09-01T23:58:09Z

@chenzhuofu , yes the wheel size will explode if we take llama 3.1 style rope into PosEncodingMode.

I'm refactoring the codebase to JIT, and the issue should be resolve soon.

yzh119 added 13 commits July 27, 2024 06:57

upd

4e00785

upd

0cc5b57

simplify

2309374

bugfix

f7d064d

upd

827cb99

python

bd5c2d0

upd

606dc66

upd

c37df82

bugfix

a12272e

upd

3cb9339

upd

0ea07a1

bugfix again

db129ee

upd

da8e2dc

yzh119 merged commit 4c89dec into main Jul 27, 2024

github-actions bot mentioned this pull request Jul 26, 2024

chore(main): release 0.1.2 #394

Merged

github-actions bot mentioned this pull request Jul 31, 2024

chore(main): release 0.1.4 #415

Merged

yzh119 deleted the llama-3.1-rope branch August 3, 2024 00:20

github-actions bot mentioned this pull request Dec 25, 2024

chore(main): release 0.3.0 #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add llama 3.1 style rope #401

feat: add llama 3.1 style rope #401

yzh119 commented Jul 27, 2024 •

edited

Loading

chenzhuofu commented Aug 25, 2024

chenzhuofu commented Aug 25, 2024 •

edited

Loading

yzh119 commented Sep 1, 2024

feat: add llama 3.1 style rope #401

feat: add llama 3.1 style rope #401

Conversation

yzh119 commented Jul 27, 2024 • edited Loading

chenzhuofu commented Aug 25, 2024

chenzhuofu commented Aug 25, 2024 • edited Loading

yzh119 commented Sep 1, 2024

yzh119 commented Jul 27, 2024 •

edited

Loading

chenzhuofu commented Aug 25, 2024 •

edited

Loading