Make round robin gradient partitioning configurable (default False)#1256
Make round robin gradient partitioning configurable (default False)#1256
Conversation
|
now that the default is off - if this feature is not documented how will users know that they could boost the performance? |
|
@stas00, good point. Docs coming soon. |
|
while we are at it, which situations it should it be off? i.e. should I put it on by default in the HF transformers config file? |
|
@stas00, that is great question. Unfortunately, the answer is currently unclear. We added this workaround because of recent a bug report that shows the optimization could break in some cases. We have not had time to investigate to understand the issue. On the other hand, the scenarios that benefit from this optimization require both (1) cpu-offloading and (2) high grad accumulation steps (> 8). Let me know if you need more information to help you decide on the default setting. |
|
Well, if it's breaking sometimes, then default is probably not a good idea. But I can document to enable it to speed things up in situations you have suggested and caution about a possible breakage. Your comments are all that I needed. Thank you, Tunji |
No description provided.