-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added torch_dmoe defaults, bug fixes for 2D inputs #1210
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
snarayan21
requested review from
mvpatel2000,
dakinggg,
b-chu and
irenedea
and removed request for
dakinggg
May 15, 2024 21:09
mvpatel2000
reviewed
May 15, 2024
LGTM but want @bigning 's signoff |
bigning
approved these changes
May 15, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Users are hitting issues where changing their
ffn_type
frommb_dmoe
totorch_dmoe
results in errors with their config. This is becausetorch_dmoe
does not have any defaults set. This PR ensures thattorch_dmoe
has the same defaults asmb_dmoe
, according to the Arguments dataclass in the Megablocks repo. Fixed typing in some places as well. Added a test to make sure thatmb_dmoe
andtorch_dmoe
are the same in fwd and bwd for these defauly values.There was also a bug for some inputs (top k = 1) where the outputs of the
max
function were 1 dimensional instead of 2. This bug has also been addressed.There's a separate bug with Megablocks where, if both the
hidden_size
andffn_hidden_size
are both 128, there's an unintended bug -- this is whyhidden_size
has been bumped to 256. With values ofhidden_size
andffn_hidden_size
not divisible by 128, there's an unhelpful error thrown. Will open a separate PR on megablocks side to address this.