[Frontend] Dynamic shape fx trace #294

Aalanli · 2023-06-27T23:02:50Z

enable the option torch.compile(..., dynamic=True)

convert torch FakeTensor to hidet Symbolic Tensor
There may be a bug in torch.dynamo, so we filter/pre-process inputs in both the example inputs and the wrapped function
Altered the graph interpreter to support non-torch functions, such as builtins add, getitem, etc. that remove dependence on register functions

Aalanli · 2023-06-29T00:54:22Z

Essentially there was a bug with the norm op. It tries to achieve polymorphism (in f32 vs f16) with class overloading (the fp16 task subclassed the fp32 task), but this results in incorrect behaviour when combined with the automatic mixed precision pass, as the Op was originally in fp32, which gets reforwarded in the pass, but the implement_cuda schedule template still assumes that the input is in fp32. This results in an array of fp16 inputs reinterpreted as fp32; pointers in c++ silently cast.

I think there are two ways to achieve type polymorphism in schedule templates right now.

Write the kernel using generic types
Write the Op and Task only using declarative definitions, then write additional Ops and Tasks that implement cuda, and derive the resolve rules. This works since the declarative definitions are polymorphic, while the resolve_variant pass happens after the automatic_mixed_precision pass.

yaoyaoding · 2023-06-30T05:58:58Z

Hi @Aalanli,

The second method is our current design. We have a some base operator (matmul, conv2d) that supports arbitrary data types and use auto scheduler to schedule. These base operators will be resove to specialized ones with specialized template, and we should check the special condition in the task definition (like in this case, we should assert the input dtype is fp16).

yaoyaoding

Thanks @Aalanli. Overall looks good to me!

@xinli-git could you also have a look at this PR (especially about the normalization part).

yaoyaoding · 2023-06-30T22:21:01Z

python/hidet/graph/frontend/torch/dynamo_backends.py

+    # unfortunately, when dynamic=True in torch.compile, there may exist other non-tensor parameters
+    #   in example inputs


For those dynamic shape, I am wondering if these scalar parameters are act as the shape of the input tensors. If that's the case, we can ignore those scalar parameters.

Say a torch model gives us

sample_inputs = [tensor(['m', 'n'], 'm', 'n']

We can declare the symbol variable for 'm' and 'n' (when we define the symbol tensor) and ignore the 'm' and 'n' scalar parameters.

Any clue on this?

yaoyaoding · 2023-07-01T03:05:05Z

python/hidet/graph/frontend/torch/register_functions.py



 @register_function(operator.iadd)
 def iadd(x: Tensor, y: Tensor):
-    return ops.add(x, y)
+    return x + y


So the x and y could be DynInt?

yaoyaoding · 2023-07-01T03:14:48Z

I think there are two ways to achieve type polymorphism in schedule templates right now.

Write the kernel using generic types

Write the Op and Task only using declarative definitions, then write additional Ops and Tasks that implement cuda, and derive the resolve rules. This works since the declarative definitions are polymorphic, while the resolve_variant pass happens after the automatic_mixed_precision pass.

To be more specific, the hidet task and their schedule template should make sure: the schedule template strictly implements what the computation defines. We can take both ways you mentioned. For example, our batch_matmul schedule template can support generic types (e.g., fp32, fp16, int32, int16, int8), but it requires the shape to be [B, M, K] and [B, K, N]. For the second case, we have base operator and their variants. We should make sure that our resolve rule is correct. And also add enough assersions in the task computation definition to prevent the not supported cases (in this case, the fp16 computation did not check the data type?).

xinli-git

Thanks for the changes in normalize. In principle, this is the right approach. I left two implementations initially so I could add vector load for the fp16 case in the future.

but now that there is the vector data type that Yaoyao has recently introduced, keeping op and op_fp16 in a single place is the right way to go, and I intend to do the same for reduce op

xinli-git · 2023-07-02T04:32:03Z

tests/frontends/torch/test_torch_resnet50.py

-    check_module(model, [x], atol=1e-2, rtol=1e-2)
+    model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=True).cuda().eval()
+    x = torch.randn(*shape).cuda()
+    check_module(model, [x], atol=1e-2, rtol=1e-2, dynamic=dynamic)


Have we been using the CPU path before this change?

xinli-git · 2023-07-02T04:35:55Z

python/hidet/graph/ops/normalize/resolve.py

-        x: Tensor = op.inputs[0]
-        if not is_contiguous_norm(dims, len(x.shape)):
-            return None
-        if x.dtype != dtypes.float16 or prod([x.shape[dd] for dd in dims]) % 2 != 0:


removing this is safe for now, but we might need to think about how to handle it when we decide to use 2xfp16 types and the norm size is odd.

xinli-git · 2023-07-02T04:36:25Z

python/hidet/graph/ops/normalize/resolve.py

@@ -32,15 +29,6 @@ class NormalizeResolveRule(ResolveRule):
    2) resolve_generic: Default case, return the output of the regular f32 reduce schedule.


remove the resolve_fp16 comment above

xinli-git · 2023-07-02T04:40:07Z

@yaoyaoding the part about normalize is fine as long as the current CI can pass. thanks for the notification :)

xinli-git · 2023-07-04T05:16:06Z

Reading this again I think the problem is that op.reforward used fp32 implement_cuda when in automatic mixed precision, the input has changed the inputs to fp16?

Is my understanding correct that we should not sub-class operators? We should either write them as seperate classes or have a generate Operator / Task that works for all input data types?

Basically this line is causing the problem: https://github.com/hidet-org/hidet/blob/main/python/hidet/graph/operator.py#L166 ?

yaoyaoding · 2023-07-04T19:18:46Z

Basically this line is causing the problem: https://github.com/hidet-org/hidet/blob/main/python/hidet/graph/operator.py#L166 ?

That line does not have problem. The reforward will create the task again based on the new inputs and parameters. The problem is that the task did not check the data type. If the task only support one data type, it should explicitly assert that its input has that data type. It it accepts the inputs, then its implement function SHOULD support that.

We can sub-class operator like the ElementwiseBinaryOp, UnaryElementwiseOp, etc.

yaoyaoding · 2023-07-04T19:19:28Z

The key convention here is: keep the task computation definition and the implement function consistent.

yaoyaoding

I am still not sure what the extra scalar parameters are, let's figure them out before merge this PR.

…nto dynamic-fx-trace

yaoyaoding · 2023-07-12T19:11:09Z

Thanks @Aalanli !

…. ) (#294) [Ir][Primitives] add vectorized conversion instructions [Ir][CuTe] add reduce primitives in cute (#295) [Ir][CuTe] add mma primitives (#296) [Ir][CuTe] add other primitives in cute (#297) [Transforms][CuTe] add instruction selection pass (#298) [Transforms][CuTe] add resolve bank conflict pass (#299) [Transforms][CuTe] add resolve auto keywords pass (#300) [Transforms][CuTe] add shared memory allocation pass (#301) [Transforms][CuTe] add vectorize elementwise operation pass (#302) [Transforms][CuTe] add analysis pass (#303) [Transforms][CuTe] add canonicalization pass (#304) [Transforms][CuTe] add deadcode elimination pass (#305) [Transforms][CuTe] refactor cute lowering pass (#306) [Graph][Ops] matmul cute (#307) [Ir] cute miscs (#308) [Tests] cute tests (#309) [Chore] fix ci (#313) --------- Co-authored-by: xiaocenxiaocen <xiao.zhang@centml.ai>

Aalanli and others added 15 commits June 9, 2023 18:45

lint again for some reason

803d302

lint again for some reason

a59ba09

Merge branch 'main' of https://github.com/Aalanli/hidet into main

0be7d38

nevermind

06a9cd0

Merge branch 'main' of https://github.com/Aalanli/hidet into main

f550d2f

Merge branch 'hidet-org:main' into main

fd19d60

Merge branch 'main' of https://github.com/Aalanli/hidet into main

323e1a8

Merge branch 'main' of https://github.com/Aalanli/hidet into main

d26f6aa

add support for dynamic shape compilation

7c4fdc1

format/lint

cb70f69

add extra check

f9ae927

add another helpful debug option

87abfe9

minor fixes

20a61c6

fix norm bug

ad7f50f

format/lint

8dcbc39

fix bug in predict correctness conditional

f58fed5

Aalanli requested a review from yaoyaoding June 29, 2023 19:24

Allan Lin added 2 commits June 30, 2023 11:32

Merge branch 'main' into dynamic-fx-trace

48b8190

lint

b92442c

yaoyaoding reviewed Jul 1, 2023

View reviewed changes

xinli-git approved these changes Jul 2, 2023

View reviewed changes

Merge branch 'main' into dynamic-fx-trace

cf21a7d

yaoyaoding requested changes Jul 4, 2023

View reviewed changes

yaoyaoding and others added 5 commits July 11, 2023 15:53

slightly change

b83f399

fix a small bug in reshape

f5ea3ab

reshape fix

3e4bc32

Merge branch 'dynamic-fx-trace' of https://github.com/Aalanli/hidet i…

1546bf4

…nto dynamic-fx-trace

fix tensor_size registered method

4b6be25

yaoyaoding approved these changes Jul 12, 2023

View reviewed changes

yaoyaoding merged commit 9d51c74 into hidet-org:main Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Dynamic shape fx trace #294

[Frontend] Dynamic shape fx trace #294

Aalanli commented Jun 27, 2023

Aalanli commented Jun 29, 2023

yaoyaoding commented Jun 30, 2023

yaoyaoding left a comment

yaoyaoding Jun 30, 2023

yaoyaoding Jul 4, 2023

yaoyaoding Jul 1, 2023

yaoyaoding commented Jul 1, 2023

xinli-git left a comment

xinli-git Jul 2, 2023

xinli-git Jul 2, 2023 •

edited

Loading

xinli-git Jul 2, 2023

xinli-git commented Jul 2, 2023

xinli-git commented Jul 4, 2023 •

edited

Loading

yaoyaoding commented Jul 4, 2023

yaoyaoding commented Jul 4, 2023

yaoyaoding left a comment

yaoyaoding commented Jul 12, 2023

		# unfortunately, when dynamic=True in torch.compile, there may exist other non-tensor parameters
		# in example inputs

		@@ -32,15 +29,6 @@ class NormalizeResolveRule(ResolveRule):
		2) resolve_generic: Default case, return the output of the regular f32 reduce schedule.

[Frontend] Dynamic shape fx trace #294

[Frontend] Dynamic shape fx trace #294

Conversation

Aalanli commented Jun 27, 2023

Aalanli commented Jun 29, 2023

yaoyaoding commented Jun 30, 2023

yaoyaoding left a comment

Choose a reason for hiding this comment

yaoyaoding Jun 30, 2023

Choose a reason for hiding this comment

yaoyaoding Jul 4, 2023

Choose a reason for hiding this comment

yaoyaoding Jul 1, 2023

Choose a reason for hiding this comment

yaoyaoding commented Jul 1, 2023

xinli-git left a comment

Choose a reason for hiding this comment

xinli-git Jul 2, 2023

Choose a reason for hiding this comment

xinli-git Jul 2, 2023 • edited Loading

Choose a reason for hiding this comment

xinli-git Jul 2, 2023

Choose a reason for hiding this comment

xinli-git commented Jul 2, 2023

xinli-git commented Jul 4, 2023 • edited Loading

yaoyaoding commented Jul 4, 2023

yaoyaoding commented Jul 4, 2023

yaoyaoding left a comment

Choose a reason for hiding this comment

yaoyaoding commented Jul 12, 2023

xinli-git Jul 2, 2023 •

edited

Loading

xinli-git commented Jul 4, 2023 •

edited

Loading