You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw that the TransformerBlock was designed with two modes, vanilla and non vanilla wiring. And as documented, the vanilla wiring is used for the plain transformer and non vanilla is used for universal transformer. The fact is, there is no difference between the position of the dropout in vanilla transformer and the universal one.
"We apply dropout [33] to the output of each sub-layer, it is added to the sub-layer input and normalized"
This stays the same with Universal Transformer. If you look at the figure in the universal transformer, there was a typo in the picture. Refer to this issue from tensor2tensor:
So I guess the correct implementation is to use vanilla_wiring=True all the time? Just for curiosity (and to help my research too), as written in the documentation of TransformerBlock, why do you think it is more reasonable to use the wiring that is currently in the old diagram?
The text was updated successfully, but these errors were encountered:
I saw that the
TransformerBlock
was designed with two modes, vanilla and non vanilla wiring. And as documented, the vanilla wiring is used for the plain transformer and non vanilla is used for universal transformer. The fact is, there is no difference between the position of the dropout in vanilla transformer and the universal one."We apply dropout [33] to the output of each sub-layer, it is added to the sub-layer input and normalized"
This stays the same with Universal Transformer. If you look at the figure in the universal transformer, there was a typo in the picture. Refer to this issue from tensor2tensor:
tensorflow/tensor2tensor#1215
This is the typo diagram:
https://images.app.goo.gl/gnjZLc4RVTndh7Fd7
This is the correct diagram, from the presentation of Mostafa Dehghani himself, as refered in that github issue:
http://mostafadehghani.com/wp-content/uploads/2018/08/Universal_Transformers.pdf/#page=11
So I guess the correct implementation is to use
vanilla_wiring=True
all the time? Just for curiosity (and to help my research too), as written in the documentation ofTransformerBlock
, why do you think it is more reasonable to use the wiring that is currently in the old diagram?The text was updated successfully, but these errors were encountered: