Funnel-Transformer #1156

mtyrolski · 2020-10-27T22:59:19Z

Implementation of https://arxiv.org/abs/2006.03236

google-cla · 2020-10-27T22:59:26Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

mtyrolski · 2020-10-28T07:25:34Z

@googlebot I signed it!

FunnelBlock and FunnelEncoder from https://arxiv.org/pdf/2006.03236.pdf

… basic unit tests (#2)

google-cla · 2020-10-28T07:35:05Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

mtyrolski · 2020-10-28T07:40:59Z

@googlebot I consent.

google-cla · 2020-10-28T07:41:04Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

syzymon · 2020-10-28T07:55:02Z

@googlebot I consent.

google-cla · 2020-10-28T07:55:07Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

shadowatyyy · 2020-10-28T09:16:29Z

@googlebot I consent.

* replaced trival assert * Fix style Co-authored-by: syzymon <s.tworkowski@student.uw.edu.pl>

google-cla · 2020-11-05T08:52:50Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

syzymon · 2020-11-05T09:02:35Z

@googlebot I fixed it.

sebastianjaszczur

Thank you for the pull request, it looks good overall!
I have some important comments about the code, however.

Also, there are some linter errors; Afroz has shared the config we use for linter in #1182 (this will be merged later). Can you run linter with this config and fix errors?

sebastianjaszczur · 2020-11-05T14:25:49Z

trax/models/research/funnel_transformer.py

+          )
+      ),
+      tl.Concatenate(axis=1)
+  ) if separate_cls else pool_layer(pool_size, strides)


I would prefer to see here if-else statement rather than conditional expression; this Serial is quite long anyway. Can you split it?

sebastianjaszczur · 2020-11-05T14:26:21Z

trax/models/research/funnel_transformer.py

+
+
+def _Upsample(short, masks, long):
+  factor = -(-long.shape[1] // short.shape[1])  # ceil division


I don't think this works as upsampling; short counterexample below. I think it would be reasonable just to assert that input length is always divisible by pool_size wherever we use pooling (input length is usually a power of two anyway). Usually the input length in training/evaluation is a power of 2 anyway, I think.

Counterexample: input_size=31, pool_size=2, single decoding block. AvgPool with default padding ("valid", not "same") will produce tensor of length 15. After this downsampling block we will have upsampling with factor = 3. This is out of sync with pool_size - we will get tokens mapped to wrong positions in "short.repeat".
While this counterexample would be fixed by changing the padding to "same", I have a feeling that we should just assert that input length at every stage will be divisible by pool_size. What do you think?

Asserting that input length is divisible by pool_size would be fine for FunnelTransformer with upsampling, but we use the same pooler for the version without it. We could modify pooler to consider whether its output will be upsampled later, but I'm not sure if it's worth it.

What could be done fairly easily is to insert in _Upsample function check that "if long.shape[1] % short.shape[1] != 0: raise ValueError('message')." Then you don't need to modify anything but _Upsample function; downsampling isn't touched at all, and we can be sure that upsampling works correctly.
The current issue I have with this _Upsample function is that it computes wrong results (see my counterexample), and I think that throwing an exception is much better than silently returning wrong results.

Adding this check is also much easier than correcting the implementation - this would involve passing a pool_size/stride to _Upsample to be used in place of 'factor', and some padding (see the counterexample in my previous comment). I think that correcting the implementation may not be worth the effort, but adding an assert is worth it.

(Also, while it isn't necessary, I would consider adding this kind of assert even during downsampling. Let's consider the case when we have only downsampling, with pool_size=2, and input length not divisible by 2. Current downsampling simply throws away the last token (due to a padding "valid", which is the default), which is kind of a strange behaviour.)

sebastianjaszczur · 2020-11-05T14:26:38Z

trax/models/research/funnel_transformer.py

+def _Upsample(short, masks, long):
+  factor = -(-long.shape[1] // short.shape[1])  # ceil division
+  new_vecs = long + short.repeat(factor, axis=1)[:, :long.shape[1], :]
+  new_masks = masks.repeat(factor, axis=-1)[:, :, :, :long.shape[1]]


I think masks shouldn't be upsampled like this - we could just use original masks instead of downsampling and upsampling them, which may introduce errors.

sebastianjaszczur · 2020-11-05T14:27:07Z

trax/models/research/funnel_transformer.py

+                 pool_layer, pool_size, strides, separate_cls):
+  """Internal funnel block. On input it takes (activations, masks).
+
+  Args:


Can you include arguments "pool_layer", "separate_cls" in the description?

sebastianjaszczur · 2020-11-05T14:27:35Z

trax/models/research/funnel_transformer.py

+      tl.Parallel(
+          None,
+          tl.Fn('mask_max_pool',
+                _InternalMaxPool),


This pool layer used for masks doesn't have pool_size/strides synchronized with pool layer for attention. So, setting pool_size/strides different than 2 will not work as I understand it.
Can you this pool layer to change mask according to pool_size?
(This comment applies also to _FunnelResidualBlock. )

The pool_size/strides in this mask pool wasn't synchronized with PoolLayer indeed. I fixed this by replacing this _InternalMaxPool by a new MaskPool layer, which utilizes PoolLayer internally so that now it can use pool_size and strides different than 2.

sebastianjaszczur · 2020-11-05T14:30:47Z

trax/models/research/funnel_transformer.py

+  )
+
+
+def FunnelTransformerEncoder(vocab_size,


Can you add documentation to those models, along with argument description like in _FunnelBlock? This is applicable also to _FunnelResidualBlock and FunnelTransformer .

The docstrings have been added.

sebastianjaszczur · 2020-11-05T14:30:59Z

trax/models/research/funnel_transformer.py

+  feed_forward = _FeedForwardBlock(
+      d_model, d_ff, dropout, dropout_shared_axes, mode, ff_activation)
+
+  dropout_ = tl.Dropout(


Why do variables have "_" suffix here?

Fixed - dropout_ layer variable name was taken from the original TransformerEncoder to avoid shadowing dropout rate argument, replaced by a more meaningful hidden_dropout.

sebastianjaszczur · 2020-11-05T14:32:00Z

trax/models/research/funnel_transformer.py

+  pooling_ = PoolLayer(pool_layer, pool_size, strides)
+
+  return [
+      tl.Parallel(tl.Branch(pooling_, None), None),


It's not very important, but can this be replaced by Select + pooling? I think it will be clearer.

This Parallel actually looks like a no-op so I removed it, but I would prefer not to apply pooling inside the residual with attention (Select is used there to split into Q, K, V).

sebastianjaszczur · 2020-11-05T14:32:13Z

trax/models/research/funnel_transformer.py

+  ]
+
+
+def FunnelTransformer(vocab_size,


Can we rename it to FunnelTransformerDecoder, to keep naming consistent with models/transformer.py ? It seems closer to TransformerDecoder than Transformer, since the former outputs an embedding per token (like this Funnel class) and the latter predicts a class per token.

As per our previous discussion, we changed the FunnelTransformer to output token-level categorical distribution over vocab instead of embeddings, which makes it useful for example as a BERT.

sebastianjaszczur · 2020-11-05T14:32:52Z

trax/models/research/funnel_transformer_test.py

+  FunnelTransformer
+
+
+class FunnelTransformerTest(parameterized.TestCase):


Can we change all tests to use, approximately, the smallest possible models? E.g. d_model=8, d_ff=8, n_layers=2 etc. This would speed up tests very significantly while testing the same functionality.

I have changed these parameters in tests, now they run much faster.

* Split PoolLayer * Adjust model sizes in unit tests to make them faster * Replace InternalMaxPool with more generic MaskPool * Cls token test * Fix formatting * Fix formatting #2 * Cls pooling fix * Remove unnecessary unpacking * Remove unused strides parameter

* keep original masks instead of upsampling * fix typo

* Rename variables in funnel residual block * Add docs * Fix dropout shadow variable * Funnel residual block refactor * FunnelTransformer output change, make residual FunnelBlock default + more docs * Docstring fix

sebastianjaszczur

Thanks! I only wanted to follow up on implementation of _Upsample. Everything else looks good to me.

sebastianjaszczur · 2020-11-16T15:38:59Z

trax/models/research/funnel_transformer.py

+
+
+def _Upsample(short, masks, long):
+  factor = -(-long.shape[1] // short.shape[1])  # ceil division


What could be done fairly easily is to insert in _Upsample function check that "if long.shape[1] % short.shape[1] != 0: raise ValueError('message')." Then you don't need to modify anything but _Upsample function; downsampling isn't touched at all, and we can be sure that upsampling works correctly.
The current issue I have with this _Upsample function is that it computes wrong results (see my counterexample), and I think that throwing an exception is much better than silently returning wrong results.

Adding this check is also much easier than correcting the implementation - this would involve passing a pool_size/stride to _Upsample to be used in place of 'factor', and some padding (see the counterexample in my previous comment). I think that correcting the implementation may not be worth the effort, but adding an assert is worth it.

(Also, while it isn't necessary, I would consider adding this kind of assert even during downsampling. Let's consider the case when we have only downsampling, with pool_size=2, and input length not divisible by 2. Current downsampling simply throws away the last token (due to a padding "valid", which is the default), which is kind of a strange behaviour.)

* new upsampler * fix comments * missing endline * Fix pep8 * Add unit test for upsampling * review fixes * Reformat Co-authored-by: syzymon <s.tworkowski@student.uw.edu.pl>

google-cla bot added the cla: no label Oct 27, 2020

google-cla bot added cla: yes and removed cla: no labels Oct 28, 2020

mtyrolski and others added 3 commits October 28, 2020 08:33

FunnelBlock + FunnelEncoder (#1)

ca65997

FunnelBlock and FunnelEncoder from https://arxiv.org/pdf/2006.03236.pdf

Add customizable pooling layer + implement alternative funnel block +…

73a64f7

… basic unit tests (#2)

Final funnel transformer (#3)

82e348b

mtyrolski force-pushed the dev branch from 09bdb52 to 82e348b Compare October 28, 2020 07:35

google-cla bot added cla: no and removed cla: yes labels Oct 28, 2020

google-cla bot added cla: yes and removed cla: no labels Oct 28, 2020

fix docstring (#4)

06fe076

afrozenator added the ready to pull Added when the PR is ready to be merged. label Nov 2, 2020

trivial-assert (#5)

7e068c8

* replaced trival assert * Fix style Co-authored-by: syzymon <s.tworkowski@student.uw.edu.pl>

google-cla bot added cla: no and removed cla: yes labels Nov 5, 2020

google-cla bot added cla: yes and removed cla: no labels Nov 5, 2020

sebastianjaszczur requested changes Nov 5, 2020

View reviewed changes

Review fixes (#7)

7aa3e84

* Split PoolLayer * Adjust model sizes in unit tests to make them faster * Replace InternalMaxPool with more generic MaskPool * Cls token test * Fix formatting * Fix formatting #2 * Cls pooling fix * Remove unnecessary unpacking * Remove unused strides parameter

afrozenator removed the ready to pull Added when the PR is ready to be merged. label Nov 9, 2020

shadowatyyy and others added 2 commits November 11, 2020 21:22

Fixes dawid (#9)

bd9691b

* keep original masks instead of upsampling * fix typo

Review fixes 2 (#8)

3b8c28b

* Rename variables in funnel residual block * Add docs * Fix dropout shadow variable * Funnel residual block refactor * FunnelTransformer output change, make residual FunnelBlock default + more docs * Docstring fix

sebastianjaszczur requested changes Nov 16, 2020

View reviewed changes

new upsampler (#10)

ce4793c

* new upsampler * fix comments * missing endline * Fix pep8 * Add unit test for upsampling * review fixes * Reformat Co-authored-by: syzymon <s.tworkowski@student.uw.edu.pl>

sebastianjaszczur added the ready to pull Added when the PR is ready to be merged. label Nov 19, 2020

copybara-service bot merged commit d10f3a0 into google:master Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Funnel-Transformer #1156

Funnel-Transformer #1156

mtyrolski commented Oct 27, 2020

google-cla bot commented Oct 27, 2020

mtyrolski commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

mtyrolski commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

syzymon commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

shadowatyyy commented Oct 28, 2020

google-cla bot commented Nov 5, 2020

syzymon commented Nov 5, 2020

sebastianjaszczur left a comment

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

shadowatyyy Nov 11, 2020

sebastianjaszczur Nov 16, 2020

sebastianjaszczur Nov 5, 2020

shadowatyyy Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur Nov 5, 2020

syzymon Nov 11, 2020

sebastianjaszczur left a comment

sebastianjaszczur Nov 16, 2020



		def _Upsample(short, masks, long):
		factor = -(-long.shape[1] // short.shape[1]) # ceil division

		FunnelTransformer


		class FunnelTransformerTest(parameterized.TestCase):

Funnel-Transformer #1156

Funnel-Transformer #1156

Conversation

mtyrolski commented Oct 27, 2020

google-cla bot commented Oct 27, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

mtyrolski commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

mtyrolski commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

syzymon commented Oct 28, 2020

google-cla bot commented Oct 28, 2020

shadowatyyy commented Oct 28, 2020

google-cla bot commented Nov 5, 2020

syzymon commented Nov 5, 2020

sebastianjaszczur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianjaszczur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment