Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

Merged
merged 36 commits into from
Nov 12, 2021

Conversation

AndrewZhaoLuo
Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo commented Oct 21, 2021

The old kernel_layout convention was incorrect for conv2d_transpose. They mixed up I and O.

E.g. if we had a conv with 5 input channels and 10 output channels with height and width of 3 the weight for IOHW would be [10, 5, 3, 3] instead of [5, 10, 3, 3]. Likewise for OIHW we would have [5, 10, 3, 3] instead of [10, 5, 3, 3]. This is extremely confusing and several times within the codebase people note the "convention" for "O" and "I" is flipped.

This is just wrong I believe. I and O should be related to the input and output channels of the convolution only.

Why was the confusion here in first case? The actual topi assume IOHW kernel inputs and we layout transform to achieve this. It seems for regular convs we are used to OIHW layouts which is why we might have had the confusion. Indeed in PyTorch, conv2d_transpose weights follow IOHW convention:

>>> nn.ConvTranspose2d(10, 30, 5, groups=2).state_dict()['weight'].shape
torch.Size([10, 15, 5, 5])

While conv2d follow OIHW:

>>> nn.Conv2d(10, 30, 5, groups=2).state_dict()['weight'].shape
torch.Size([30, 5, 5, 5])

This has caused me a great deal of confusion. It seems like the current code only works because people mix up the layout convention manually. E.g. the layout in the code is said to be OIHW but the shape of the tensor and how they run the calculation is done in IOHW.

This is my attempt at untying the knot and making it much more consistent.

TODOs:
Make pyxir remove this line and renable the skipped test for vitas.py: https://github.com/Xilinx/pyxir/blob/master/python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380

@AndrewZhaoLuo AndrewZhaoLuo changed the title [WIP] Fix inconsistent kernel layout conventions for conv2d_transpose [Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose Oct 21, 2021
@AndrewZhaoLuo AndrewZhaoLuo marked this pull request as ready for review October 22, 2021 00:01
@mbrookhart
Copy link
Contributor

mbrookhart commented Oct 22, 2021

@masahi Perhaps you or someone you know has a better sense of framework conventions here than I do? I'm not sure if there's a common representation here.

@AndrewZhaoLuo
Copy link
Contributor Author

AndrewZhaoLuo commented Oct 22, 2021

I think in general, I = the dimension that will change when we expect the input channels to change. O = the dimension that will change when we expect the output channels to change. In this operator, this is not the case (and it also isn't for the 1D and 3D cases too probably but I will change those if this gets good traction).

I will say I can understand maybe why this convention is so weird.

conv2d_transpose can be seen as the gradient of conv2d.

So 'I' and 'O' represent the input and output channels of the imaginary conv2d function this is a gradient of. However, in conv2d_transpose, we don't have this context really and flipping the signs will cause confusion.

…aoLuo/tvm into aluo/qnn/conv2d-transpose-fixes

* 'aluo/qnn/conv2d-transpose-fixes' of github.com:AndrewZhaoLuo/tvm: (27 commits)
  fix keras
  jostle
  more frontend tests fixes
  remove f2qi for later pr
  remove formatting only change
  undo just formatting changes
  lint
  add todo
  fix fake quantization pass
  fix qnn conv2d transpose tests
  fix vitis tests
  change layouts for conv2d_transpose too
  fix bug with layout transform
  add test
  lint
  make pytorch tests pass
  fix a lot of initial tests
  Disable Hexagon TestConv2dPackedFilter test (apache#9344)
  BUG: Look through on_device annotations when looking for shape constants (apache#9345)
  [Community] @ganler -> Reviewer (apache#9346)
  ...
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clarification makes sense to me so I approved first.
It would be better if @masahi could help take a final check.

Comment on lines +243 to +249
@pytest.mark.skip(
reason="I and O used to be mixed up in kernel layouts in TVM."
"This is fixed, but vitis needs to adopt the new convention."
"To change, simply remove this line:"
"https://github.com/Xilinx/pyxir/blob/bef661d6d77adcdbd2cf4163f2cf3a1d31d40406/"
"python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jtuyls

Copy link
Contributor

@jtuyls jtuyls Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this kernel layout switch makes a lot of sense to me. I will fix this in the Vitis flow and then we can remove skipping the test.

@masahi
Copy link
Member

masahi commented Nov 11, 2021

Is the PR #9465 related?

@masahi masahi self-assigned this Nov 11, 2021
@junrushao
Copy link
Member

@Lyken17 is this related to your fix? Would be great if you could take a look :-) Thanks a lot!

@Lyken17
Copy link
Contributor

Lyken17 commented Nov 11, 2021

@junrushao1994 Yes, definitely related. We both noticed that the conv2d_transpose weight layout should be IOHW rather than OIHW (used in conv2d). This is indeed confusing and leads to several wrong shape checks. Will go through the code in details later today.

@@ -1044,7 +1044,7 @@ bool Conv2DTransposeRel(const Array<Type>& types, int num_inputs, const Attrs& a
if (data == nullptr) return false;

static const Layout kNCHW("NCHW");
static const Layout kOIHW("OIHW");
static const Layout kIOHW("IOHW");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the layout variable affect later calculation, or is it just an alias of layout?

Copy link
Contributor

@Lyken17 Lyken17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This PR fixs the wrongly adapted OIHW in current topi.nn.conv2d_transpose as well as many related shape claims / defnitions.

@junrushao junrushao merged commit 6159b8e into apache:main Nov 12, 2021
@junrushao
Copy link
Member

Thank you all for the huge effort this week!

AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Nov 12, 2021
* main: (119 commits)
  [Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose (apache#9336)
  Fix repository URL in ubuntu_install_rocm.sh (apache#9425)
  Add LLVM-13 installation to Docker setup (apache#9498)
  [Relay] Use target_host determined at Relay level instead of recalculating it (apache#9499)
  Arm(R) Ethos(TM)-U NPU BinaryElementwise operators support (apache#9442)
  [COMMUNITY] Junru's and Wuwei's PGP key for ASF release (apache#9488)
  Add default for split op (apache#9489)
  [HOTFIX][TARGET] Change LOG in compilation config to DLOG (apache#9486)
  Fixed some warnings about lambda's closures that are bigger than necessary (apache#9481)
  [Support] Add libinfo into the runtime build (apache#9310)
  Change Call with TIRCallAttrs to call_lowered op (apache#9312)
  [ETHOSN] Streamline Ethos(TM)-N cross-compile rpc usage (apache#9477)
  [CMSIS-NN] Assert correct amount of CMSIS-NN artifacts in MLF (apache#9480)
  [MicroTVM][PyTest] Explicitly skip MicroTVM unittests. (apache#9335)
  [microNPU] Replace ICHECK with diagnostic context in type inference (apache#9470)
  Better host handling in CompilationConfig & debug printing (apache#9460)
  [AOT][Tests] Use pre-built libraries in Reference System tests (apache#9271)
  [TIR] Add type hint for TIR  (apache#9432)
  [TVMC] Add test for quantized pytorch model (apache#9467)
  [CMSIS-NN] Convert CMSIS-NN to use Target Hooks (apache#9397)
  ...
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
… for conv2d_transpose (apache#9336)

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* jostle

* fix keras

* fix another frontend test

* fix things

* jostle ci
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
… for conv2d_transpose (apache#9336)

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* jostle

* fix keras

* fix another frontend test

* fix things

* jostle ci
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
… for conv2d_transpose (apache#9336)

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* jostle

* fix keras

* fix another frontend test

* fix things

* jostle ci
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
… for conv2d_transpose (apache#9336)

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* jostle

* fix keras

* fix another frontend test

* fix things

* jostle ci
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
… for conv2d_transpose (apache#9336)

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* fix a lot of initial tests

* make pytorch tests pass

* lint

* add test

* fix bug with layout transform

* change layouts for conv2d_transpose too

* fix vitis tests

* fix qnn conv2d transpose tests

* fix fake quantization pass

* add todo

* lint

* undo just formatting changes

* remove formatting only change

* remove f2qi for later pr

* more frontend tests fixes

* jostle

* fix keras

* fix another frontend test

* fix things

* jostle ci
zotanika added a commit to zotanika/incubator-tvm that referenced this pull request Jan 14, 2022
masahi pushed a commit that referenced this pull request Jan 17, 2022
* [Caffe Frontend] supporting group > 1 cases for Deconv op

- Handling group > 1 cases, assuming group == output channels
- Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops
- Added some test cases

Signed-off-by: zotanika <zotanika@gmail.com>

* [Caffe Frontend] amending a test case for Deconv op

Signed-off-by: zotanika <zotanika@gmail.com>

* explicit importing tvm.testing

* changing split axis to 0, according to PR #9336
yuanfz98 pushed a commit to yuanfz98/tvm that referenced this pull request Jan 24, 2022
* [Caffe Frontend] supporting group > 1 cases for Deconv op

- Handling group > 1 cases, assuming group == output channels
- Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops
- Added some test cases

Signed-off-by: zotanika <zotanika@gmail.com>

* [Caffe Frontend] amending a test case for Deconv op

Signed-off-by: zotanika <zotanika@gmail.com>

* explicit importing tvm.testing

* changing split axis to 0, according to PR apache#9336
crazydemo pushed a commit to crazydemo/tvm that referenced this pull request Jan 27, 2022
* [Caffe Frontend] supporting group > 1 cases for Deconv op

- Handling group > 1 cases, assuming group == output channels
- Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops
- Added some test cases

Signed-off-by: zotanika <zotanika@gmail.com>

* [Caffe Frontend] amending a test case for Deconv op

Signed-off-by: zotanika <zotanika@gmail.com>

* explicit importing tvm.testing

* changing split axis to 0, according to PR apache#9336
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
* [Caffe Frontend] supporting group > 1 cases for Deconv op

- Handling group > 1 cases, assuming group == output channels
- Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops
- Added some test cases

Signed-off-by: zotanika <zotanika@gmail.com>

* [Caffe Frontend] amending a test case for Deconv op

Signed-off-by: zotanika <zotanika@gmail.com>

* explicit importing tvm.testing

* changing split axis to 0, according to PR apache#9336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants