Add `inputs_k` and `inputs_v` args to attention layer #3379

chiamp · 2023-09-28T22:23:06Z

Currently, MultiHeadDotProductAttention layer's call method signature is MultiHeadDotProductAttention.__call__(inputs_q, inputs_kv, mask=None, deterministic=None). As discussed in #1737, there are some cases where passing in separate values for the key and values is desired, which isn't possible with the current API. This PR adds two more arguments, inputs_k and inputs_v to the call method signature and sets the method signature to the following: MultiHeadDotProductAttention.__call__(inputs_q, inputs_k=None, inputs_v=None, *, inputs_kv=None, mask=None, deterministic=None). Note that the inputs_kv, mask and deterministic args are now keyword arguments.

if inputs_k and inputs_v are None, then they will both copy the value of inputs_q (i.e. self attention)
if inputs_v is None, it will copy the value of inputs_k (same behavior as the previous API, i.e. module.apply(inputs_q=query, inputs_k=key_value, ...) is equivalent to module.apply(inputs_q=query, inputs_kv=key_value, ...))
if inputs_kv is not None, both inputs_k and inputs_v will copy the value of inputs_kv

Users can still use inputs_kv but a DeprecationWarning will be raised and inputs_kv will be removed in the future.
Since self attention can be done using this new API, the SelfAttention layer will also raise a DeprecationWarning and will be removed in the future.

Check out #3389 to see examples of how to port your code over to the new API.

codecov-commenter · 2023-09-28T22:48:35Z

Codecov Report

Merging #3379 (1d41190) into main (f20aed4) will increase coverage by 0.02%.
Report is 2 commits behind head on main.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##             main    #3379      +/-   ##
==========================================
+ Coverage   83.60%   83.62%   +0.02%     
==========================================
  Files          56       56              
  Lines        6746     6767      +21     
==========================================
+ Hits         5640     5659      +19     
- Misses       1106     1108       +2

Files	Coverage Δ
flax/linen/attention.py	`94.19% <90.90%> (-0.59%)`	⬇️

... and 1 file with indirect coverage changes

flax/linen/attention.py

cgarciae · 2023-10-05T14:10:22Z

Left a comment. Otherwise, looks good!

-- f6a222c by Marcus Chiam <marcuschiam@google.com>: split inputs_kv arg in attention layer COPYBARA_INTEGRATE_REVIEW=#3379 from chiamp:attention f6a222c PiperOrigin-RevId: 572671273

chiamp · 2023-10-11T20:57:22Z

Closing after this commit landed.

-- f6a222c by Marcus Chiam <marcuschiam@google.com>: split inputs_kv arg in attention layer COPYBARA_INTEGRATE_REVIEW=google#3379 from chiamp:attention f6a222c PiperOrigin-RevId: 572671273

chiamp self-assigned this Sep 28, 2023

chiamp changed the title ~~split inputs_kv arg in attention layer~~ Add inputs_k and inputs_v args to attention layer Sep 28, 2023

chiamp added the pull ready label Sep 28, 2023

chiamp force-pushed the attention branch from b7be6d7 to a40e86f Compare September 28, 2023 22:40

chiamp requested review from levskaya and cgarciae September 28, 2023 23:04

chiamp force-pushed the attention branch 8 times, most recently from 2db0753 to dc02493 Compare October 4, 2023 22:13

cgarciae reviewed Oct 5, 2023

View reviewed changes

flax/linen/attention.py Outdated Show resolved Hide resolved

cgarciae approved these changes Oct 5, 2023

View reviewed changes

split inputs_kv arg in attention layer

1d41190

chiamp force-pushed the attention branch from dc02493 to 1d41190 Compare October 5, 2023 18:27

chiamp closed this Oct 11, 2023

chiamp deleted the attention branch October 27, 2023 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `inputs_k` and `inputs_v` args to attention layer #3379

Add `inputs_k` and `inputs_v` args to attention layer #3379

chiamp commented Sep 28, 2023 •

edited

Loading

codecov-commenter commented Sep 28, 2023 •

edited

Loading

cgarciae commented Oct 5, 2023

chiamp commented Oct 11, 2023

Add inputs_k and inputs_v args to attention layer #3379

Add inputs_k and inputs_v args to attention layer #3379

Conversation

chiamp commented Sep 28, 2023 • edited Loading

codecov-commenter commented Sep 28, 2023 • edited Loading

Codecov Report

cgarciae commented Oct 5, 2023

chiamp commented Oct 11, 2023

Add `inputs_k` and `inputs_v` args to attention layer #3379

Add `inputs_k` and `inputs_v` args to attention layer #3379

chiamp commented Sep 28, 2023 •

edited

Loading

codecov-commenter commented Sep 28, 2023 •

edited

Loading