[nnx] Add LinearGeneral and MultiHeadAttention #3487

cgarciae · 2023-11-16T20:23:50Z

What does this PR do?

Ports DenseGeneral as LinearGeneral
Ports MultiHeadDotProductAttention as MultiHeadAttention

codecov-commenter · 2023-11-22T21:14:24Z

Codecov Report

Attention: 67 lines in your changes are missing coverage. Please review.

Comparison is base (e172c76) 53.33% compared to head (0a1f78a) 53.81%.
Report is 1 commits behind head on main.

Files	Patch %	Lines
flax/experimental/nnx/nnx/nn/attention.py	63.87%	56 Missing ⚠️
flax/experimental/nnx/nnx/nn/linear.py	87.34%	10 Missing ⚠️
flax/experimental/nnx/nnx/module.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3487      +/-   ##
==========================================
+ Coverage   53.33%   53.81%   +0.48%     
==========================================
  Files          95       98       +3     
  Lines       11252    11513     +261     
==========================================
+ Hits         6001     6196     +195     
- Misses       5251     5317      +66

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

review-notebook-app · 2023-11-23T15:59:43Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

chiamp · 2023-11-27T18:39:44Z

flax/experimental/nnx/nnx/nn/attention.py

+        warnings.warn(
+          f'You are passing an array of shape {inputs_v.shape} '
+          'to the `inputs_v` arg, when you may have intended '
+          'to pass it to the `mask` arg. As of Flax version '
+          '0.7.4, the function signature of '
+          "MultiHeadAttention's `__call__` method "
+          'has changed to `__call__(inputs_q, inputs_k=None, '
+          'inputs_v=None, *, inputs_kv=None, mask=None, '
+          'deterministic=None)`. Use the kwarg `mask` instead. '
+          'See https://github.com/google/flax/discussions/3389 '
+          'and read the docstring for more information.',
+          DeprecationWarning,
+        )


Not sure if the warning message is relevant since the NNX attention layer started with this new call signature of __call__(self, inputs_q, inputs_k, inputs_v, *, inputs_kv, ...). But I do think linking the flax discussions page could still be useful for users to give context.

chiamp · 2023-11-27T18:48:39Z

flax/experimental/nnx/nnx/nn/attention.py

+      deterministic: if false, the attention weight is masked randomly using
+        dropout, whereas if true, the attention weights are deterministic.
+      dropout_rng: optional rng key to pass to the attention layer's dropout
+        mask. Otherwise, self.make_rng('dropout') is used instead.


self.make_rng is only relevant in Flax, I believe? Or does NNX also use it? Is it equivalent to rngs.dropout()?

flax/experimental/nnx/nnx/nn/attention.py

chiamp · 2023-11-27T18:59:42Z

flax/experimental/nnx/tests/nn/test_attention.py

+import jax.numpy as jnp
+
+from flax.experimental import nnx
+
+
+class TestMultiHeadAttention:
+  def test_basic(self):
+    module = nnx.MultiHeadAttention(2, 3, 6, rngs=nnx.Rngs(0))
+    y = module(jnp.ones((1, 7, 3)))
+    assert y.shape == (1, 7, 6)


Can we port over tests from tests/linen/linen_attention_test.py

chiamp · 2023-11-27T19:09:36Z

flax/experimental/nnx/tests/nn/test_linear.py

+import jax.numpy as jnp
+
+from flax.experimental import nnx
+
+
+class TestLinearGeneral:
+  def test_basic(self):
+    module = nnx.LinearGeneral(2, 3, rngs=nnx.Rngs(0))
+    y = module(jnp.ones((1, 2)))
+
+    assert y.shape == (1, 3)
+    assert module.kernel.shape == (2, 3)
+    assert module.bias is not None
+    assert module.bias.shape == (3,)
+
+  def test_basic_multi_features(self):
+    module = nnx.LinearGeneral(2, (3, 4), rngs=nnx.Rngs(0))
+    y = module(jnp.ones((1, 2)))
+
+    assert y.shape == (1, 3, 4)
+    assert module.kernel.shape == (2, 3, 4)
+    assert module.bias is not None
+    assert module.bias.shape == (3, 4)


Can we port over tests from tests/linen/linen_linear_test.py

chiamp · 2023-11-27T19:10:32Z

pyproject.toml

+    # DeprecationWarning: pkg_resources is deprecated as an API.
+    "ignore:.*pkg_resources is deprecated as an API.*:DeprecationWarning",
+    # DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
+    "ignore:.*Deprecated call to.*pkg_resources.declare_namespace.*:DeprecationWarning",


where are these warnings coming from?

avital · 2023-11-29T00:48:09Z

flax/experimental/nnx/nnx/nn/linear.py

+
+  Example usage::
+
+    >>> import flax.linen as nn


this comment seems to point to linen rather than nnx?

cgarciae force-pushed the nnx-mha branch from a9e1902 to c322ff8 Compare November 16, 2023 20:26

chiamp approved these changes Nov 16, 2023

View reviewed changes

cgarciae force-pushed the nnx-mha branch from c322ff8 to 2c7fd47 Compare November 21, 2023 13:20

cgarciae added the pull ready label Nov 23, 2023

add LinearGeneral and MultiHeadAttention

7ff6e4b

cgarciae force-pushed the nnx-mha branch from 9047644 to 7ff6e4b Compare November 23, 2023 16:45

chiamp reviewed Nov 27, 2023

View reviewed changes

flax/experimental/nnx/nnx/nn/attention.py Show resolved Hide resolved

chiamp reviewed Nov 27, 2023

View reviewed changes

simplify MultiHeadAttention API

0a1f78a

chiamp added pull ready and removed pull ready labels Nov 27, 2023

copybara-service bot merged commit a572f6a into main Nov 29, 2023
21 checks passed

copybara-service bot deleted the nnx-mha branch November 29, 2023 00:17

avital reviewed Nov 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nnx] Add LinearGeneral and MultiHeadAttention #3487

[nnx] Add LinearGeneral and MultiHeadAttention #3487

cgarciae commented Nov 16, 2023

codecov-commenter commented Nov 22, 2023 •

edited

Loading

review-notebook-app bot commented Nov 23, 2023

chiamp Nov 27, 2023

chiamp Nov 27, 2023 •

edited

Loading

chiamp Nov 27, 2023 •

edited

Loading

chiamp Nov 27, 2023

chiamp Nov 27, 2023

avital Nov 29, 2023

[nnx] Add LinearGeneral and MultiHeadAttention #3487

[nnx] Add LinearGeneral and MultiHeadAttention #3487

Conversation

cgarciae commented Nov 16, 2023

What does this PR do?

codecov-commenter commented Nov 22, 2023 • edited Loading

Codecov Report

review-notebook-app bot commented Nov 23, 2023

chiamp Nov 27, 2023

Choose a reason for hiding this comment

chiamp Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

chiamp Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

chiamp Nov 27, 2023

Choose a reason for hiding this comment

chiamp Nov 27, 2023

Choose a reason for hiding this comment

avital Nov 29, 2023

Choose a reason for hiding this comment

codecov-commenter commented Nov 22, 2023 •

edited

Loading

chiamp Nov 27, 2023 •

edited

Loading

chiamp Nov 27, 2023 •

edited

Loading