There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

gregory112 · 2019-06-18T16:12:54Z

I saw that the TransformerBlock was designed with two modes, vanilla and non vanilla wiring. And as documented, the vanilla wiring is used for the plain transformer and non vanilla is used for universal transformer. The fact is, there is no difference between the position of the dropout in vanilla transformer and the universal one.

"We apply dropout [33] to the output of each sub-layer, it is added to the sub-layer input and normalized"

This stays the same with Universal Transformer. If you look at the figure in the universal transformer, there was a typo in the picture. Refer to this issue from tensor2tensor:

tensorflow/tensor2tensor#1215

This is the typo diagram:
https://images.app.goo.gl/gnjZLc4RVTndh7Fd7

This is the correct diagram, from the presentation of Mostafa Dehghani himself, as refered in that github issue:
http://mostafadehghani.com/wp-content/uploads/2018/08/Universal_Transformers.pdf/#page=11

So I guess the correct implementation is to use vanilla_wiring=True all the time? Just for curiosity (and to help my research too), as written in the documentation of TransformerBlock, why do you think it is more reasonable to use the wiring that is currently in the old diagram?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

gregory112 commented Jun 18, 2019

There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

Comments

gregory112 commented Jun 18, 2019