Output stride order #2548

jjsjann123 · 2023-03-07T01:03:54Z

Added new python API fd.ops.add_output(tensor, stride_order), where stride_order means that output axis i is the stride_order[i]th fastest dimension.

e.g. if we want to specify output to be in channel-last format, we should specify fd.ops.add_output(tensor_view, [0, 3, 1, 2]), where a given output with shape [N, C, H, W] will have stride [H*W*C, 1, W*C, C]

Implementation details:
It's currently done in a naive way. Since nvfuser doesn't support user specified stride order yet, we fake it by:

adding a permute op on outputs inside the generated kernel, to ensure that the output is stored in the correct memory layout;
after the kernel has executed, we permute that corresponding output to undo the permutation inside the kernel, this gives us the semantically correct output in the desired memory layout.

…oid permutation info lost in cache hit

jacobhinkle

Looks good

jacobhinkle · 2023-03-07T17:26:02Z

third_party/nvfuser/csrc/python_frontend/python_bindings.cpp

+             std::vector<int64_t> stride_order) {
+            FUSER_PERF_SCOPE("FusionDefinition.add_output (tensor)");
+            TORCH_CHECK(
+                !self.id().has_value(),


Off-topic to this PR, but self.completed() exists for these.

jacobhinkle · 2023-03-07T17:35:55Z

third_party/nvfuser/csrc/python_frontend/fusion_record.h

+          TORCH_CHECK(
+              duplicate_check == (1 << reverse_perm.size()) - 1,
+              "duplicated elements in stride_order detected!");
+          tv_output = permute(tv_output, reverse_perm);


I could be wrong but I think this call is first time the length of provided perm is checked to equal ndim of other argument. It might be nice to check that right up front.

good point. I had it in python_bindings.cpp earlier, and removed that as duplication.

But I think it makes sense to throw an error up front. I'll add one down there.

rdspring1

LGTM.

For serialization, we just need to add the stride_order field for OutputRecord.

jjsjann123 · 2023-03-07T19:27:03Z

review comments have been addressed. running CI locally and will merge the PR afterwards.

jjsjann123 · 2023-03-07T21:15:35Z

I am seeing failing CIs, but I don't think they are relevant. I'm merging this one.

csarofeen · 2023-09-10T14:33:48Z

@zasdfgbnm @jacobhinkle could we revisit this approach now that we have allocation domains?

csarofeen · 2023-09-10T14:34:05Z

Warning this is the csarofeen/pytorch repo.

jjsjann123 and others added 20 commits March 1, 2023 11:33

new API to set output stride order

8cd272b

fixing build

67da0ce

fixing constness

2e92aea

fixing permute

de03b19

debug print

13b3069

patching failed logic

e5c1c12

renaming variable / clean printf

2cab501

detect out of range and duplicate elements in stride_order argument

67afec5

build failure

90f3070

adding tests

9c3f111

fixing fusion accessor; fixing python script for testests

034c0b3

fixing constness

e7c48f6

moving stride order logic from pybind registry to fusion_record to av…

4b78e95

…oid permutation info lost in cache hit

refactoring permutation map to be stored in fusFusionDefinition

3af99c9

patching build

e9083a0

patching build

1566d62

moving output permutation map to fusion

5a2c717

reverting test changes

2fea02b

reverting permutation map on FusionDefinition

bdf8cd6

reverting unwanted changes; removing debug print

a8bc033

jjsjann123 requested review from rdspring1, kevinstephano and mruberry March 7, 2023 01:03

jjsjann123 added 2 commits March 6, 2023 17:49

Merge remote-tracking branch 'csarofeen/devel' into HEAD

5f068ae

lintrunner

7db3776

jjsjann123 requested a review from jacobhinkle March 7, 2023 17:16

jacobhinkle approved these changes Mar 7, 2023

View reviewed changes

rdspring1 reviewed Mar 7, 2023

View reviewed changes

jjsjann123 added 2 commits March 7, 2023 10:37

Merge remote-tracking branch 'csarofeen/devel' into output_stride_order

9b0771f

addressing review comments

e8866fa

jjsjann123 added 2 commits March 7, 2023 10:58

fixing build with upstream pytorch

d48e8f2

fixing typo

0a7b911

jjsjann123 merged commit 8ed9540 into devel Mar 7, 2023

jjsjann123 deleted the output_stride_order branch March 7, 2023 21:15

jjsjann123 mentioned this pull request Apr 28, 2023

Codegen tensor to support arbitrary stride order NVIDIA/Fuser#248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output stride order #2548

Output stride order #2548

jjsjann123 commented Mar 7, 2023 •

edited

Loading

jacobhinkle left a comment

jacobhinkle Mar 7, 2023

jacobhinkle Mar 7, 2023

jjsjann123 Mar 7, 2023

rdspring1 left a comment

jjsjann123 commented Mar 7, 2023

jjsjann123 commented Mar 7, 2023

csarofeen commented Sep 10, 2023

csarofeen commented Sep 10, 2023

Output stride order #2548

Output stride order #2548

Conversation

jjsjann123 commented Mar 7, 2023 • edited Loading

jacobhinkle left a comment

Choose a reason for hiding this comment

jacobhinkle Mar 7, 2023

Choose a reason for hiding this comment

jacobhinkle Mar 7, 2023

Choose a reason for hiding this comment

jjsjann123 Mar 7, 2023

Choose a reason for hiding this comment

rdspring1 left a comment

Choose a reason for hiding this comment

jjsjann123 commented Mar 7, 2023

jjsjann123 commented Mar 7, 2023

csarofeen commented Sep 10, 2023

csarofeen commented Sep 10, 2023

jjsjann123 commented Mar 7, 2023 •

edited

Loading