-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336
[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336
Conversation
@masahi Perhaps you or someone you know has a better sense of framework conventions here than I do? I'm not sure if there's a common representation here. |
I think in general, I will say I can understand maybe why this convention is so weird. conv2d_transpose can be seen as the gradient of conv2d. So 'I' and 'O' represent the input and output channels of the imaginary conv2d function this is a gradient of. However, in conv2d_transpose, we don't have this context really and flipping the signs will cause confusion. |
…aoLuo/tvm into aluo/qnn/conv2d-transpose-fixes * 'aluo/qnn/conv2d-transpose-fixes' of github.com:AndrewZhaoLuo/tvm: (27 commits) fix keras jostle more frontend tests fixes remove f2qi for later pr remove formatting only change undo just formatting changes lint add todo fix fake quantization pass fix qnn conv2d transpose tests fix vitis tests change layouts for conv2d_transpose too fix bug with layout transform add test lint make pytorch tests pass fix a lot of initial tests Disable Hexagon TestConv2dPackedFilter test (apache#9344) BUG: Look through on_device annotations when looking for shape constants (apache#9345) [Community] @ganler -> Reviewer (apache#9346) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clarification makes sense to me so I approved first.
It would be better if @masahi could help take a final check.
@pytest.mark.skip( | ||
reason="I and O used to be mixed up in kernel layouts in TVM." | ||
"This is fixed, but vitis needs to adopt the new convention." | ||
"To change, simply remove this line:" | ||
"https://github.com/Xilinx/pyxir/blob/bef661d6d77adcdbd2cf4163f2cf3a1d31d40406/" | ||
"python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jtuyls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this kernel layout switch makes a lot of sense to me. I will fix this in the Vitis flow and then we can remove skipping the test.
Is the PR #9465 related? |
@Lyken17 is this related to your fix? Would be great if you could take a look :-) Thanks a lot! |
@junrushao1994 Yes, definitely related. We both noticed that the |
@@ -1044,7 +1044,7 @@ bool Conv2DTransposeRel(const Array<Type>& types, int num_inputs, const Attrs& a | |||
if (data == nullptr) return false; | |||
|
|||
static const Layout kNCHW("NCHW"); | |||
static const Layout kOIHW("OIHW"); | |||
static const Layout kIOHW("IOHW"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the layout variable affect later calculation, or is it just an alias of layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This PR fixs the wrongly adapted OIHW
in current topi.nn.conv2d_transpose
as well as many related shape claims / defnitions.
Thank you all for the huge effort this week! |
* main: (119 commits) [Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose (apache#9336) Fix repository URL in ubuntu_install_rocm.sh (apache#9425) Add LLVM-13 installation to Docker setup (apache#9498) [Relay] Use target_host determined at Relay level instead of recalculating it (apache#9499) Arm(R) Ethos(TM)-U NPU BinaryElementwise operators support (apache#9442) [COMMUNITY] Junru's and Wuwei's PGP key for ASF release (apache#9488) Add default for split op (apache#9489) [HOTFIX][TARGET] Change LOG in compilation config to DLOG (apache#9486) Fixed some warnings about lambda's closures that are bigger than necessary (apache#9481) [Support] Add libinfo into the runtime build (apache#9310) Change Call with TIRCallAttrs to call_lowered op (apache#9312) [ETHOSN] Streamline Ethos(TM)-N cross-compile rpc usage (apache#9477) [CMSIS-NN] Assert correct amount of CMSIS-NN artifacts in MLF (apache#9480) [MicroTVM][PyTest] Explicitly skip MicroTVM unittests. (apache#9335) [microNPU] Replace ICHECK with diagnostic context in type inference (apache#9470) Better host handling in CompilationConfig & debug printing (apache#9460) [AOT][Tests] Use pre-built libraries in Reference System tests (apache#9271) [TIR] Add type hint for TIR (apache#9432) [TVMC] Add test for quantized pytorch model (apache#9467) [CMSIS-NN] Convert CMSIS-NN to use Target Hooks (apache#9397) ...
… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci
… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci
… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci
… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci
… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci
* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR #9336
* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR apache#9336
* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR apache#9336
* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR apache#9336
The old kernel_layout convention was incorrect for conv2d_transpose. They mixed up
I
andO
.E.g. if we had a conv with 5 input channels and 10 output channels with height and width of 3 the weight for
IOHW
would be[10, 5, 3, 3]
instead of[5, 10, 3, 3]
. Likewise forOIHW
we would have[5, 10, 3, 3]
instead of[10, 5, 3, 3]
. This is extremely confusing and several times within the codebase people note the "convention" for "O" and "I" is flipped.This is just wrong I believe.
I
andO
should be related to the input and output channels of the convolution only.Why was the confusion here in first case? The actual topi assume IOHW kernel inputs and we layout transform to achieve this. It seems for regular convs we are used to OIHW layouts which is why we might have had the confusion. Indeed in PyTorch, conv2d_transpose weights follow IOHW convention:
While conv2d follow OIHW:
This has caused me a great deal of confusion. It seems like the current code only works because people mix up the layout convention manually. E.g. the layout in the code is said to be OIHW but the shape of the tensor and how they run the calculation is done in IOHW.
This is my attempt at untying the knot and making it much more consistent.
TODOs:
Make pyxir remove this line and renable the skipped test for vitas.py: https://github.com/Xilinx/pyxir/blob/master/python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380