symbolic shape #9902

daquexian · 2023-02-24T11:54:26Z

symbolic shape 用符号来表示未知的长度，提供了表达动态形状的能力，是支持动态形状的必备环节。这篇知乎文章讲述了一些背景和 TVM/BladeDisc 的相关实现：https://zhuanlan.zhihu.com/p/608027985

本 PR 实现了：

一个表示长度的类 Dim，它可以表示已知的长度（Dim a = 4）也可以表示未知的长度（Dim::Unknown()），同时 Shape 类也从 vector<int64_t> 改为了 vector<Dim>。一个 Dim 对象内部成员只有一个 int64_t，这样 Shape 的数据才是紧凑的，和 oneflow 中需要 int64_t* header_ptr 作为形状指针的地方才能兼容。没有用 -1 这样的 int64_t 特殊值来表示动态长度的原因是这样无法自动实现动态形状的传染（具体见下一条），也不类型安全。
Dim 对象和各种基本数据类型之间的加减乘除等运算符，使 symbolic dimension 可以在不改动每个 OP 的形状推导逻辑的情况下随着形状推导而自动传染，可看 test_symbolic_shape.py 这个单测
和 MLIR 中的 dynamic size 的互转

合并后的注意事项：

从 Shape 中取值取到的不再是普通的 int64_t 而是 Dim 对象。Dim 类型重载了许多常见运算符，也实现了和 int64_t 之间的隐式转换，所以大部分代码不会感知到这一点，但写形状推导逻辑时尽量不要用 int 表示长度，因为 Dim 向 int 的隐式转换在 Dim 为动态时将会报错（也就会表现为这个 OP 不支持动态形状）。
反例：

// pad op
// (1, 3, ?, ?) --- pad --> 报错（? 表示动态长度）
int output_h = input_shape[3] + 2 * padding_h;  // 触发隐式转换，无法支持动态形状
int output_w = input_shape[4] + 2 * padding_w;
SetOutputShape(Shape{input_shape[0], input_shape[1], output_h, output_w});

正例：

// pad op
// (1, 3, ?, ?) --- pad --> (1, 3, ?, ?)
auto output_h = input_shape[3] + 2 * padding_h;  // auto 自动推导为 Dim
auto output_w = input_shape[4] + 2 * padding_w;
SetOutputShape(Shape{input_shape[0], input_shape[1], output_h, output_w});

和形状推导不同，在 kernel 中拿到的 shape 一定是已知的，所以 ShapeView::At 和 ShapeView::operator[] 返回的是 int64_t，如果需要拿到 Dim，可使用 DimAt
Dim 对象底层是一个 int64_t，复制一个 Dim 对象的成本和复制一个 int64_t 相同，所以推荐值传递而非引用传递，避免解引用的隐形开销

Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian · 2023-02-24T12:01:54Z

oneflow/core/common/dim.cpp

+
+bool operator==(const Dim& a, const Dim& b) {
+  if (a.is_known() && b.is_known()) { return a.value_ == b.value_; }
+  // reflexivity: Dim::Unknown() == Dim::Unknown()


为了让各种 stl 容器能正常工作，unknown == unknown 一定要成立，这并不完全自然，接下来可能会通过区分不同的 unknown dim 来解决这个问题，类似 ONNX/TVM/... 里的做法

daquexian · 2023-02-24T12:03:40Z

python/oneflow/test/graph/test_symbolic_shape.py

+        x = x * w1.to(flow.float32)
+        x = x.unsqueeze(0)
+        y = x.sum(dim=1)
+        if lazy_mode.is_enabled():
+            # Shape inference works correctly even with the presence of
+            # symbolic dimensions:
+            assert x.shape == (1, flow.Dim.unknown(), 4)
+            # y has a static shape even though x has a symbolic shape
+            assert y.shape == (1, 4)


symbolic shape 在形状推导中自动传染：(unknown, 4) 经过 unsqueeze 变成 (1, unknown, 4)，(1, unknown, 4) 经过 sum(dim=1) 变成静态的 (1, 4)

github-actions · 2023-02-24T12:06:38Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

… errors Signed-off-by: daquexian <daquexian566@gmail.com>

…hape

Signed-off-by: daquexian <daquexian566@gmail.com>

github-actions · 2023-03-20T17:38:16Z

CI failed when running job: Build cpu. PR label automerge has been removed

Signed-off-by: daquexian <daquexian566@gmail.com>

github-actions · 2023-03-22T18:24:46Z

CI failed when running job: Build cpu. PR label automerge has been removed

Signed-off-by: daquexian <daquexian566@gmail.com>

github-actions · 2023-03-23T04:49:20Z

Speed stats:

github-actions · 2023-03-23T05:01:56Z

CI failed when running job: cpu-misc. PR label automerge has been removed

github-actions · 2023-03-23T05:02:33Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2023-03-23T05:04:19Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.0ms (= 14103.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14324.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 143.2ms / 141.0ms)

OneFlow resnet50 time: 80.8ms (= 8076.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8583.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 85.8ms / 80.8ms)

OneFlow resnet50 time: 49.5ms (= 9898.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.8ms (= 11552.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.17 (= 57.8ms / 49.5ms)

OneFlow resnet50 time: 32.9ms (= 6579.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.7ms (= 8531.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.30 (= 42.7ms / 32.9ms)

OneFlow resnet50 time: 25.0ms (= 4991.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.7ms (= 8336.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.67 (= 41.7ms / 25.0ms)




❌ OneFlow resnet50 time: 154.9ms (= 15485.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.1ms (= 16409.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.06 (= 164.1ms / 154.9ms)

OneFlow resnet50 time: 92.5ms (= 9246.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10337.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 103.4ms / 92.5ms)

OneFlow resnet50 time: 60.5ms (= 12090.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15673.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 78.4ms / 60.5ms)

OneFlow resnet50 time: 43.4ms (= 8676.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14202.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 71.0ms / 43.4ms)

OneFlow resnet50 time: 37.6ms (= 7528.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13292.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 66.5ms / 37.6ms)

github-actions · 2023-03-23T05:05:18Z

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions · 2023-03-23T05:08:46Z

CI failed when running job: cuda-misc. PR label automerge has been removed

github-actions · 2023-03-23T05:49:33Z

CI failed when running job: cuda-speed-test. PR label automerge has been removed

daquexian and others added 19 commits February 19, 2023 16:38

init InterpretJob

6658783

Signed-off-by: daquexian <daquexian566@gmail.com>

support sd

69decc3

Signed-off-by: daquexian <daquexian566@gmail.com>

cache op exprs

0314b66

Signed-off-by: daquexian <daquexian566@gmail.com>

refine, add test

96addf4

Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'master' into job_vm

f543907

refine

f527293

Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'master' into job_vm

0e7097c

add more comments

ed42968

Signed-off-by: daquexian <daquexian566@gmail.com>

auto format by CI

2179698

rename dead_tensors -> outdated_tensors_after_op

fa3e6f1

Signed-off-by: daquexian <daquexian566@gmail.com>

auto format by CI

6a99a3c

init symbolic shape

bb0587e

Signed-off-by: daquexian <daquexian566@gmail.com>

update ShapeProto and other methods, support mlir

92f0f5d

Signed-off-by: daquexian <daquexian566@gmail.com>

refine

9f2004f

Signed-off-by: daquexian <daquexian566@gmail.com>

get it running

6f926a3

Signed-off-by: daquexian <daquexian566@gmail.com>

refine

0774994

Signed-off-by: daquexian <daquexian566@gmail.com>

compile all kernels, add graph._input_shape interface

ea2f159

Signed-off-by: daquexian <daquexian566@gmail.com>

set an empty stride in lazy mode, add a test

56b6934

Signed-off-by: daquexian <daquexian566@gmail.com>

update tests

e74fdcc

Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian requested review from hjchen2, BBuf, jackalcooper, chengtbf, strint and liujuncheng as code owners February 24, 2023 11:54

daquexian commented Feb 24, 2023

View reviewed changes

daquexian requested a review from oneflow-ci-bot February 24, 2023 12:05

auto format by CI

686466f

Merge branch 'master' into sym_shape

8363de3

daquexian added 3 commits March 20, 2023 14:44

add Shape::Shape(std::initializer_list<int64_t>) to fix clang compile…

999e600

… errors Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'sym_shape' of github.com:Oneflow-Inc/oneflow into sym_s…

2e8a117

…hape

Merge branch 'master' into sym_shape

939197c

Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian added the automerge label Mar 20, 2023

mergify bot added 2 commits March 20, 2023 09:51

Merge branch 'master' into sym_shape

9a24a2b

Merge branch 'master' into sym_shape

b2f4c86

github-actions bot removed the automerge label Mar 20, 2023

daquexian added 3 commits March 21, 2023 20:53

only expose Shape::Shape(std::initializer_list<int64_t>) for clang

161a395

Signed-off-by: daquexian <daquexian566@gmail.com>

fix compile errors in nvcc

d3848c5

Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'master' into sym_shape

7e1298b

daquexian added the automerge label Mar 22, 2023

mergify bot added 3 commits March 22, 2023 08:46

Merge branch 'master' into sym_shape

096976a

Merge branch 'master' into sym_shape

31d71d7

Merge branch 'master' into sym_shape

9988db2

github-actions bot removed the automerge label Mar 22, 2023

avoid compile error in CI

fa5bfdf

Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian added the automerge label Mar 23, 2023

mergify bot added 3 commits March 23, 2023 01:16

Merge branch 'master' into sym_shape

4a2ce42

Merge branch 'master' into sym_shape

a0c8a4e

Merge branch 'master' into sym_shape

f1bc5da

github-actions bot removed the automerge label Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symbolic shape #9902

symbolic shape #9902

daquexian commented Feb 24, 2023 •

edited

Loading

daquexian Feb 24, 2023 •

edited

Loading

daquexian Feb 24, 2023

github-actions bot commented Feb 24, 2023

github-actions bot commented Mar 20, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

symbolic shape #9902

Are you sure you want to change the base?

symbolic shape #9902

Conversation

daquexian commented Feb 24, 2023 • edited Loading

daquexian Feb 24, 2023 • edited Loading

Choose a reason for hiding this comment

daquexian Feb 24, 2023

Choose a reason for hiding this comment

github-actions bot commented Feb 24, 2023

github-actions bot commented Mar 20, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

github-actions bot commented Mar 23, 2023

daquexian commented Feb 24, 2023 •

edited

Loading

daquexian Feb 24, 2023 •

edited

Loading