Skip to content

Make round robin gradient partitioning configurable (default False)#1256

Merged
tjruwase merged 1 commit intomasterfrom
olruwase/round_robin_gradient_option
Jul 28, 2021
Merged

Make round robin gradient partitioning configurable (default False)#1256
tjruwase merged 1 commit intomasterfrom
olruwase/round_robin_gradient_option

Conversation

@tjruwase
Copy link
Contributor

No description provided.

@stas00
Copy link
Collaborator

stas00 commented Jul 28, 2021

now that the default is off - if this feature is not documented how will users know that they could boost the performance?

@tjruwase
Copy link
Contributor Author

@stas00, good point. Docs coming soon.

@stas00
Copy link
Collaborator

stas00 commented Jul 28, 2021

while we are at it, which situations it should it be off? i.e. should I put it on by default in the HF transformers config file?

@tjruwase
Copy link
Contributor Author

tjruwase commented Jul 28, 2021

@stas00, that is great question. Unfortunately, the answer is currently unclear. We added this workaround because of recent a bug report that shows the optimization could break in some cases. We have not had time to investigate to understand the issue. On the other hand, the scenarios that benefit from this optimization require both (1) cpu-offloading and (2) high grad accumulation steps (> 8). Let me know if you need more information to help you decide on the default setting.

@stas00
Copy link
Collaborator

stas00 commented Jul 28, 2021

Well, if it's breaking sometimes, then default is probably not a good idea.

But I can document to enable it to speed things up in situations you have suggested and caution about a possible breakage.

Your comments are all that I needed. Thank you, Tunji

@tjruwase
Copy link
Contributor Author

@stas00, #1261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments