-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. #33325
Merged
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
25c92e1
intilize new embeddings from normal distrib
abuelnasr0 a95639c
Fix typo in comments
abuelnasr0 d850b99
Fix typo in comments
abuelnasr0 3f44507
Fix style
abuelnasr0 5ea5f82
Fix variables naming
abuelnasr0 d1d81d5
Add tests
abuelnasr0 f3aaf0a
Fix style
abuelnasr0 bdef61a
code consistency nit
abuelnasr0 15a7b5a
Add deepspeed support
abuelnasr0 6e40b4f
Add deepspeed support
abuelnasr0 aba7d8c
Conver embeddings weights to float32 before computations
abuelnasr0 4f1b0fa
Add deepspeed tests
abuelnasr0 dea8e28
Cover when vocab_size is smaller than embedding_size
abuelnasr0 84f8cfa
Style fix
abuelnasr0 2923e85
Add tests for vocab_size smaller than hiddin_size
abuelnasr0 188ba1b
Style fix
abuelnasr0 22ac85c
Nits in tests
abuelnasr0 3e42f66
Nits in tests
abuelnasr0 226f31c
Check for deepspeed before importing it
abuelnasr0 cef744f
Increase vocab_size for positive definite covariance matrix test
abuelnasr0 6583cd5
Add warning
abuelnasr0 7577cd4
Add multivariate_resizing flag and implement resizing for lm_heads
abuelnasr0 0472bac
Fix typo
abuelnasr0 fd4ad00
Fix wrong bias indexing
abuelnasr0 6ff2bca
Fix bias is zero check
abuelnasr0 12e61c6
remove multivariate_resizing flag from tests
abuelnasr0 eb80c33
Intialize bias from old bias normal distribution
abuelnasr0 ef6bdbc
Fixup
abuelnasr0 5cdce5f
Code usability
abuelnasr0 f4a9cf4
Use mean_resizing instead of multivariate_resizing
abuelnasr0 fc436d7
Fix up
abuelnasr0 8e60a36
Fix comments and docs
abuelnasr0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this can be re-used no? As a
"self.init_tensor"
which checks if deepspeed is available, computes the covariance if not given, uses None otherwiseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have introduced three functions:
self._init_added_embeddings_weights_with_mean()
self._init_added_lm_head_weights_with_mean()
and it usesself._init_added_embeddings_weights_with_mean()
self._init_added_lm_head_bias_with_mean()
This will improve code usability for our case. what do you think? I am open to any other change.
Also, I think that
mean_resizing
is more user-friendly and explains the whole point of the new resizing technique.