Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symbolic shape #9902

Open
wants to merge 80 commits into
base: master
Choose a base branch
from
Open

symbolic shape #9902

wants to merge 80 commits into from

Conversation

daquexian
Copy link
Contributor

@daquexian daquexian commented Feb 24, 2023

symbolic shape 用符号来表示未知的长度,提供了表达动态形状的能力,是支持动态形状的必备环节。这篇知乎文章讲述了一些背景和 TVM/BladeDisc 的相关实现:https://zhuanlan.zhihu.com/p/608027985

本 PR 实现了:

  • 一个表示长度的类 Dim,它可以表示已知的长度(Dim a = 4)也可以表示未知的长度(Dim::Unknown()),同时 Shape 类也从 vector<int64_t> 改为了 vector<Dim>。一个 Dim 对象内部成员只有一个 int64_t,这样 Shape 的数据才是紧凑的,和 oneflow 中需要 int64_t* header_ptr 作为形状指针的地方才能兼容。没有用 -1 这样的 int64_t 特殊值来表示动态长度的原因是这样无法自动实现动态形状的传染(具体见下一条),也不类型安全。
  • Dim 对象和各种基本数据类型之间的加减乘除等运算符,使 symbolic dimension 可以在不改动每个 OP 的形状推导逻辑的情况下随着形状推导而自动传染,可看 test_symbolic_shape.py 这个单测
  • 和 MLIR 中的 dynamic size 的互转

合并后的注意事项:

  • 从 Shape 中取值取到的不再是普通的 int64_t 而是 Dim 对象。Dim 类型重载了许多常见运算符,也实现了和 int64_t 之间的隐式转换,所以大部分代码不会感知到这一点,但写形状推导逻辑时尽量不要用 int 表示长度,因为 Dim 向 int 的隐式转换在 Dim 为动态时将会报错(也就会表现为这个 OP 不支持动态形状)。
    反例:
    // pad op
    // (1, 3, ?, ?) --- pad --> 报错(? 表示动态长度)
    int output_h = input_shape[3] + 2 * padding_h;  // 触发隐式转换,无法支持动态形状
    int output_w = input_shape[4] + 2 * padding_w;
    SetOutputShape(Shape{input_shape[0], input_shape[1], output_h, output_w});
    正例:
    // pad op
    // (1, 3, ?, ?) --- pad --> (1, 3, ?, ?)
    auto output_h = input_shape[3] + 2 * padding_h;  // auto 自动推导为 Dim
    auto output_w = input_shape[4] + 2 * padding_w;
    SetOutputShape(Shape{input_shape[0], input_shape[1], output_h, output_w});
  • 和形状推导不同,在 kernel 中拿到的 shape 一定是已知的,所以 ShapeView::At 和 ShapeView::operator[] 返回的是 int64_t,如果需要拿到 Dim,可使用 DimAt
  • Dim 对象底层是一个 int64_t,复制一个 Dim 对象的成本和复制一个 int64_t 相同,所以推荐值传递而非引用传递,避免解引用的隐形开销

daquexian and others added 19 commits February 19, 2023 16:38
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>

bool operator==(const Dim& a, const Dim& b) {
if (a.is_known() && b.is_known()) { return a.value_ == b.value_; }
// reflexivity: Dim::Unknown() == Dim::Unknown()
Copy link
Contributor Author

@daquexian daquexian Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了让各种 stl 容器能正常工作,unknown == unknown 一定要成立,这并不完全自然,接下来可能会通过区分不同的 unknown dim 来解决这个问题,类似 ONNX/TVM/... 里的做法

Comment on lines 26 to 34
x = x * w1.to(flow.float32)
x = x.unsqueeze(0)
y = x.sum(dim=1)
if lazy_mode.is_enabled():
# Shape inference works correctly even with the presence of
# symbolic dimensions:
assert x.shape == (1, flow.Dim.unknown(), 4)
# y has a static shape even though x has a symbolic shape
assert y.shape == (1, 4)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symbolic shape 在形状推导中自动传染:(unknown, 4) 经过 unsqueeze 变成 (1, unknown, 4),(1, unknown, 4) 经过 sum(dim=1) 变成静态的 (1, 4)

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

CI failed when running job: Build cpu. PR label automerge has been removed

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
@github-actions
Copy link
Contributor

CI failed when running job: Build cpu. PR label automerge has been removed

Signed-off-by: daquexian <daquexian566@gmail.com>
@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

CI failed when running job: cpu-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

CI failed when running job: cuda-module. PR label automerge has been removed

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.0ms (= 14103.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14324.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 143.2ms / 141.0ms)

OneFlow resnet50 time: 80.8ms (= 8076.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8583.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 85.8ms / 80.8ms)

OneFlow resnet50 time: 49.5ms (= 9898.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.8ms (= 11552.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 57.8ms / 49.5ms)

OneFlow resnet50 time: 32.9ms (= 6579.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.7ms (= 8531.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.30 (= 42.7ms / 32.9ms)

OneFlow resnet50 time: 25.0ms (= 4991.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.7ms (= 8336.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.67 (= 41.7ms / 25.0ms)




❌ OneFlow resnet50 time: 154.9ms (= 15485.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.1ms (= 16409.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.06 (= 164.1ms / 154.9ms)

OneFlow resnet50 time: 92.5ms (= 9246.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10337.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 103.4ms / 92.5ms)

OneFlow resnet50 time: 60.5ms (= 12090.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15673.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 78.4ms / 60.5ms)

OneFlow resnet50 time: 43.4ms (= 8676.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14202.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 71.0ms / 43.4ms)

OneFlow resnet50 time: 37.6ms (= 7528.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13292.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 66.5ms / 37.6ms)

@github-actions
Copy link
Contributor

CI failed when running job: cpu-module. PR label automerge has been removed

@github-actions
Copy link
Contributor

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

CI failed when running job: cuda-speed-test. PR label automerge has been removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants