Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There was a typo in the diagram showing the arrangement of the layers in Universal Transformer paper #20

Open
gregory112 opened this issue Jun 18, 2019 · 0 comments

Comments

@gregory112
Copy link

I saw that the TransformerBlock was designed with two modes, vanilla and non vanilla wiring. And as documented, the vanilla wiring is used for the plain transformer and non vanilla is used for universal transformer. The fact is, there is no difference between the position of the dropout in vanilla transformer and the universal one.

"We apply dropout [33] to the output of each sub-layer, it is added to the sub-layer input and normalized"

This stays the same with Universal Transformer. If you look at the figure in the universal transformer, there was a typo in the picture. Refer to this issue from tensor2tensor:

tensorflow/tensor2tensor#1215

This is the typo diagram:
https://images.app.goo.gl/gnjZLc4RVTndh7Fd7

This is the correct diagram, from the presentation of Mostafa Dehghani himself, as refered in that github issue:
http://mostafadehghani.com/wp-content/uploads/2018/08/Universal_Transformers.pdf/#page=11

So I guess the correct implementation is to use vanilla_wiring=True all the time? Just for curiosity (and to help my research too), as written in the documentation of TransformerBlock, why do you think it is more reasonable to use the wiring that is currently in the old diagram?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant