added sow attention weights #3529

chiamp · 2023-12-05T01:03:50Z

Resolves #3530 using @JyChang012's implementation of sowing attention weights; this behavior can be configured at call time. An alternative option would be to return the weights as a tuple, as how Pytorch and Tensorflow does it

chiamp · 2023-12-05T01:12:56Z

cc: @cgarciae @JyChang012 @Xiaoming-Zhao

JyChang012 · 2023-12-05T01:21:43Z

👍

cgarciae · 2023-12-05T01:30:48Z

Just a small comment, can we give a more descriptive name to the mdl argument?

chiamp · 2023-12-05T02:10:34Z

Just a small comment, can we give a more descriptive name to the mdl argument?

would module work instead

cgarciae · 2023-12-05T23:39:23Z

flax/linen/attention.py

@@ -76,6 +77,10 @@ def dot_product_attention_weights(
    dtype: the dtype of the computation (default: infer from inputs and params)
    precision: numerical precision of the computation see `jax.lax.Precision`
      for details.
+    mdl: if not None, the attention weights are sowed into the 'intermediates'


maybe explain that this is the module used to sow the attention weights (if given).

codecov-commenter · 2023-12-06T02:18:42Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (512a6d8) 56.16% compared to head (9d989b0) 56.04%.
Report is 2 commits behind head on main.

Files	Patch %	Lines
flax/linen/attention.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3529      +/-   ##
==========================================
- Coverage   56.16%   56.04%   -0.13%     
==========================================
  Files         100      100              
  Lines       11861    11865       +4     
==========================================
- Hits         6662     6650      -12     
- Misses       5199     5215      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Xiaoming-Zhao · 2023-12-08T05:27:54Z

Hi @chiamp , may I know whether we could change to the following alternative:

An alternative option would be to return the weights as a tuple, as how Pytorch and Tensorflow does it

I am asking because I just realized that intermediates in sow can only extracted at the outer-most call to nn.Module.

For example, if we have nested modules, which are all specified with nn.compact and nn.MultiHeadDotProductAttention is only called within some inner modules instead of the outer-most one. Then, it is not trivial to do any operation on the attentions within those inner modules in the forward pass.

Here is a minimal example where my goal is to use attention_weights in SubModel's call.:

from jax.nn import initializers
from flax import linen as nn


class SubModel(nn.Module):
  attention_kwargs: dict

  @nn.compact
  def __call__(self, x, return_weights=False):
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(
      x, return_weights=return_weights
    )
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(x)
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(
      x, return_weights=return_weights
    )
    return x


class Model(nn.Module):

  @nn.compact
  def __call__(self, x, return_weights=False):
    x = SubModel(
      dict(
        num_heads=8,
        qkv_features=16,
        kernel_init=initializers.ones,
        bias_init=initializers.zeros,
        deterministic=False,
      )
    )(x, return_weights=return_weights)
    return x


rng = random.key(0)
x = jnp.ones((4, 6, 5))

module = Model()

v = module.init(rng, x)
_, intermediates = module.apply(
  v, x, mutable=['intermediates'], return_weights=True
)

print(intermediates['intermediates']['SubModel_0']['MultiHeadDotProductAttention_0']['attention_weights'][0].shape)

However, if we can directly return a tuple, then any manipulation of the attention weights is straightforward.

I know that returning a tuple is something quite easy to implement but is unfortunately a break change. I am wondering whether there are some better solutions to this.

Or maybe I just misunderstand sow and the attention weights could actually be extracted within some inner modules in a nested case. I am happy to learn more about it. Thanks a lot in advance.

chiamp · 2023-12-09T00:43:02Z

hi @Xiaoming-Zhao, would accessing the sowed intermediates via self.variables work for you:

class SubModel(nn.Module):
  attention_kwargs: dict

  @nn.compact
  def __call__(self, x, return_weights=False):
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(
      x, return_weights=return_weights
    )
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(x)
    x = nn.MultiHeadDotProductAttention(**self.attention_kwargs)(
      x, return_weights=return_weights
    )

    # access intermediates via self.variables
    if return_weights:
      attention_weights_0 = self.variables['intermediates']['MultiHeadDotProductAttention_0']['attention_weights']
      attention_weights_2 = self.variables['intermediates']['MultiHeadDotProductAttention_2']['attention_weights']

    return x


class Model(nn.Module):

  @nn.compact
  def __call__(self, x, return_weights=False):
    x = SubModel(
      dict(
        num_heads=8,
        qkv_features=16,
        kernel_init=initializers.ones,
        bias_init=initializers.zeros,
        deterministic=False,
      )
    )(x, return_weights=return_weights)

    # access intermediates via self.variables
    if return_weights:
      submodel_attention_weights_0 = self.variables['intermediates']['SubModel_0']['MultiHeadDotProductAttention_0']['attention_weights']
      submodel_attention_weights_2 = self.variables['intermediates']['SubModel_0']['MultiHeadDotProductAttention_2']['attention_weights']

    return x

chiamp self-assigned this Dec 5, 2023

chiamp force-pushed the attention branch from 841dc9f to 93ded49 Compare December 5, 2023 18:24

chiamp requested a review from cgarciae December 5, 2023 19:00

cgarciae reviewed Dec 5, 2023

View reviewed changes

cgarciae approved these changes Dec 5, 2023

View reviewed changes

chiamp force-pushed the attention branch from 93ded49 to ad996e6 Compare December 6, 2023 02:09

chiamp added the pull ready label Dec 6, 2023

added sow attention weights

9d989b0

chiamp force-pushed the attention branch from ad996e6 to 9d989b0 Compare December 6, 2023 02:32

copybara-service bot merged commit 50cd169 into google:main Dec 6, 2023
19 checks passed

chiamp deleted the attention branch December 6, 2023 21:46

This was referenced Dec 12, 2023

added sow attention weights to NNX #3548

Merged

changed return_weights to sow_weights for attention layer #3550

Merged

chiamp mentioned this pull request Dec 19, 2023

added multiheadattention alias #3572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added sow attention weights #3529

added sow attention weights #3529

chiamp commented Dec 5, 2023 •

edited

Loading

chiamp commented Dec 5, 2023

JyChang012 commented Dec 5, 2023

cgarciae commented Dec 5, 2023

chiamp commented Dec 5, 2023

cgarciae Dec 5, 2023

codecov-commenter commented Dec 6, 2023 •

edited

Loading

Xiaoming-Zhao commented Dec 8, 2023 •

edited

Loading

chiamp commented Dec 9, 2023

added sow attention weights #3529

added sow attention weights #3529

Conversation

chiamp commented Dec 5, 2023 • edited Loading

chiamp commented Dec 5, 2023

JyChang012 commented Dec 5, 2023

cgarciae commented Dec 5, 2023

chiamp commented Dec 5, 2023

cgarciae Dec 5, 2023

Choose a reason for hiding this comment

codecov-commenter commented Dec 6, 2023 • edited Loading

Codecov Report

Xiaoming-Zhao commented Dec 8, 2023 • edited Loading

chiamp commented Dec 9, 2023

chiamp commented Dec 5, 2023 •

edited

Loading

codecov-commenter commented Dec 6, 2023 •

edited

Loading

Xiaoming-Zhao commented Dec 8, 2023 •

edited

Loading