Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims #442

sashaDoubov · 2023-07-07T23:22:39Z

For h_dim=8, we see NaNs due to a .expand of the key value tensors, which can be resolved with .repeat. While h_dim=8 is an edge case, we are not sure if there are other cases of h_dims for which this might be problematic, or if there might be silent failures.

Note that this does come at a performance hit, see MFU for 7b, so it may be desirable to revert this change in the future.

(blue curve: .expand() green curve .repeat() red curve .expand().clone())

sashaDoubov added 2 commits July 7, 2023 22:29

add repeat

adf406a

add test and comment

14d9967

sashaDoubov requested review from vchiley and dakinggg July 7, 2023 23:23

Merge branch 'main' into repeat_not_expand

25562d7

vchiley approved these changes Jul 7, 2023

View reviewed changes

sashaDoubov merged commit 86a99e2 into mosaicml:main Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims #442

Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims #442

sashaDoubov commented Jul 7, 2023 •

edited

Loading

Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims #442

Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims #442

Conversation

sashaDoubov commented Jul 7, 2023 • edited Loading

sashaDoubov commented Jul 7, 2023 •

edited

Loading