-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding CPU training support to AxoNN #39
base: develop
Are you sure you want to change the base?
Conversation
Avuxon
commented
Oct 10, 2023
•
edited by siddharth9820
Loading
edited by siddharth9820
- Switch to gloo for communication
Need to make the changes conditional on CPUs/GPUs |
Before merging this should have
|
0d2fa52
to
43f159a
Compare
Add default values for environment vars Fixed communication handle flags Fixed formatting
Formatting fixed formatting again tensor-list change Formatting and Device-Setting Fixed gpus_per_node access
Co-authored-by: Mahua Singh <mahua04@pssg-mordor.umiacs.umd.edu>
* initialize grad_input to None * minor
Added missing parameter to test Fixed formatting docs: fix build issues and add sub-sections (#69)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When attempting to fix the CI I discovered that the gloo backend doesn't support reduce-scatters. Therefore, as of now AxoNN wouldn't work on CPUs with G_intra_d>1. We should add an assert that checks this condition in axonn.py.