Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sxjscience
Copy link
Member

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

@sxjscience sxjscience requested a review from a team as a code owner January 18, 2021 01:06
@sxjscience sxjscience changed the title [Bug][Fix] Fix pre-layernormalization in Transformer [Bug][Fix][WIP] Fix pre-layernormalization in Transformer Jan 18, 2021
@codecov
Copy link

codecov bot commented Jan 18, 2021

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235     
Impacted Files Coverage Δ
src/gluonnlp/data/batchify.py 88.72% <ø> (ø)
src/gluonnlp/layers.py 87.15% <100.00%> (+0.03%) ⬆️
src/gluonnlp/models/transformer.py 98.93% <100.00%> (-0.01%) ⬇️
conftest.py 76.31% <0.00%> (-8.69%) ⬇️
src/gluonnlp/data/loading.py 75.75% <0.00%> (-7.64%) ⬇️
src/gluonnlp/utils/lazy_imports.py 58.42% <0.00%> (-2.25%) ⬇️
src/gluonnlp/utils/misc.py 52.51% <0.00%> (-1.06%) ⬇️
src/gluonnlp/data/tokenizers/yttm.py 81.73% <0.00%> (-1.02%) ⬇️
src/gluonnlp/data/tokenizers/spacy.py 65.33% <0.00%> (-0.91%) ⬇️
src/gluonnlp/data/tokenizers/huggingface.py 71.06% <0.00%> (-0.78%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

@yongyi-wu
Copy link
Member

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

@github-actions
Copy link

@sxjscience
Copy link
Member Author

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants