-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move replica axis to front everywhere #1877
Conversation
0090645
to
453a98f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fairly harmless to me... if nothing crashed and you tested it with 1 and more than 1 replicas...
Could you run the qed example as well? (since it is the other one you did modify). But again, if it didn't crash it is probably ok. |
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
I tested with 1 and 3 replicas, printing at every epoch for the first 200 and all digits match. It seems that in the fitbot there are small differences though, perhaps again differences were always there but not visible at only 200 epochs. |
The fitbot needs to be updated with the latest master though |
Ah ok, that explains it. do you want to wait for that? |
I did a manual comparison to the fit produced for #1883, and they're not exactly the same: https://vp.nnpdf.science/1vkMFHCmQd63Pnvwg2V23A==/ The dependencies did change a bit, but I didn't expect that would be in a way that would impact the result - I think it's worth rebasing this on top of #1883 and rerunning the fitbot, just to be sure. I had a look at the changes and they indeed seem harmless, but I would be more comfortable if that could be confirmed in a test. |
453a98f
to
d5e14f9
Compare
Thanks for the check, will do! |
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
Great, are you ok to merge as well @scarlehoff? |
This just puts the replica axis first wherever it wasn't yet.
It gives identical results, to the last digit, on:
In the MSR normalization layer, I do transpose them back to their old shape just because having the flavor axis first there makes for cleaner code. I think the performance hit there is negligible, but this can always be optimized later on.