-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split axis according to grouped axises #8919
Conversation
…into fix-reshape_sbp-bug
[9, 2, 2] -> [9, 4] supports S0 (besides S1)
这个 case 需要加到 单测里。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
再补充一下单测就可以了,我没有问题了
Speed stats:
|
CI failed when running job: cpu-module. PR label automerge has been removed |
Speed stats:
|
…into fix-reshape_sbp-bug
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8919/ |
Speed stats:
|
CI failed when running job: cuda-module. PR label automerge has been removed |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8919/ |
Speed stats:
|
The current algorithm has a bug under such situation:
In shape: (49, 49, 24)
Out shape: (2401, 24)
The old algorithm would allow {in: S0, out: S0} on 2 GPUs since 24 is divisible by 2.
However, on GPU 0,
In physical shape: (25, 49, 24)
Out physical shape: (1201, 24)
It would cause a bug:
Check failed: (out_shape->elem_cnt()) == (in_shape.elem_cnt()) (28824 vs 29400)
Similar case on GPU 1.
This algorithm fixes this bug.
With these reasonable SBP candidates, the throughput of auto parallel would increase by 50%.