Skip to content

Commit

Permalink
feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy
Browse files Browse the repository at this point in the history
Applying the same change done in StaticCache.
  • Loading branch information
tengomucho committed Jul 10, 2024
1 parent d329ad2 commit 53e99d1
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions src/transformers/cache_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -969,8 +969,11 @@ def update(
k_out = k_out[:, :, indices]
v_out = v_out[:, :, indices]

k_out[:, :, cache_position] = key_states
v_out[:, :, cache_position] = value_states
# Note: here we use `tensor.index_copy_(dim, index, tensor)` that is equivalent to
# `tensor[:, :, index] = tensor`, but the first one is compile-friendly and it does explicitly an in-place
# operation, that avoids copies and uses less memory.
k_out.index_copy_(2, cache_position, key_states)
v_out.index_copy_(2, cache_position, value_states)

# `_.zero()` followed by `+=` is equivalent `=`, but compile-friendly (without graph breaks due to assignment)
self.key_cache[layer_idx].zero_()
Expand Down

0 comments on commit 53e99d1

Please sign in to comment.