Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNNL][BYOC] Enable Altering Dense Weight Layout #11966

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

billishyahao
Copy link
Contributor

The patch including four parts:

  1. Enable nn.contrib_dense_pack to be partitioned and offloaded by dnnl byoc.
  2. Enable alter dense layout function during build relay model by introducing nn.contrib_dense_pack.
  3. Make some minor fixes in tensor_requisite.h.
  4. Add UT for pack dense.

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

@billishyahao billishyahao changed the title Add alter dense layout [DNNL][BYOC] Enable altering dense weight layout Jun 30, 2022
@billishyahao billishyahao changed the title [DNNL][BYOC] Enable altering dense weight layout [DNNL][BYOC] Enable Altering Dense Weight Layout Jun 30, 2022
@billishyahao
Copy link
Contributor Author

@apeskov @masahi @yangulei @crazydemo @linlifan @Qianshui-Jiang Please take a look :-)

src/relay/op/nn/nn.cc Outdated Show resolved Hide resolved
src/relay/op/nn/nn.cc Outdated Show resolved Hide resolved
src/relay/op/nn/nn.cc Outdated Show resolved Hide resolved
src/runtime/contrib/dnnl/dnnl_json_runtime.cc Outdated Show resolved Hide resolved
@@ -94,6 +94,9 @@ def partition_for_dnnl(mod, params=None, alter_layout=True, prune_subgraphs=True
)
with tvm.transform.PassContext(opt_level=3):
mod = seq(mod)

mod = dnnl.rewrite_dense_bias_gelu_reshape_last(mod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain, why do we need this pass before "AlterOp" transformation passes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Peskov, This pass rewrite_dense_bias_gelu_reshape_last only matches the pattern "dense_bias_activation". If we are desired to put it after "AlterOp" pass, then I need to extend the capability of pass rewrite_dense_bias_gelu_reshape .

@apeskov
Copy link
Contributor

apeskov commented Jul 1, 2022

Thanks @billishyahao, nice patch!

You've touched a common tvm::relay code a little bit to enhance layouts support of packed dense op. There is one delicate nuance here and I would like to highlight it.

"weight_layout" is arbitrary string like "NC", "CN", "CN8c", "CN16n4c" and any others. It should match with regex: (NC|CN)([:digit:](c|n))*. Will be perfect to have support all of these possible cases.
Let's take a close look on next example. Dense with next shapes: data_shape [128, 10], weight_shape [17, 10], weight_layout NC, output_shape will be [128, 17]. Assume that we applies alter op layout and change layout to NC8c. Weight shape will be changed to [3, 10, 8], with some additional padding. Unexpectedly, output shape will also be changed to [128, 24]. Weight layout conversion changes output shape, that's very strange behaviour. I know, count attribute should keep original size of output channels, but it can be None.

So I recommend you to take into account count size in "MakeDensePack" implementation and propagate output shape correctly.

@yangulei
Copy link
Contributor

yangulei commented Jul 4, 2022

Hi @apeskov, what you mentioned is a common issue of blocked layout.

Let's take a close look on next example. Dense with next shapes: data_shape [128, 10], weight_shape [17, 10], weight_layout NC, output_shape will be [128, 17]. Assume that we applies alter op layout and change layout to NC8c. Weight shape will be changed to [3, 10, 8], with some additional padding. Unexpectedly, output shape will also be changed to [128, 24]. Weight layout conversion changes output shape, that's very strange behavior.

If additional padding applied when transformed from a plain layout to blocked layout, a cropping must be applied too when transformed back to plain layout. A bijective transformation should ensure origin = backward(forward(origin)), but it's not guarantied so far.

Padding is natural, while cropping needs extra information. We use the extra information from the definition of Conv to solve this problem with the blocked weights, but it's a workaround instead of a general solution. I think we need both logical shape and concrete shape for tensor, just like the dims and padded dims in DNNL memory descriptor.

Maybe we need a Pass to Infer the original logical shapes and save them as attributes for later usage, do you have any idea about this?

@billishyahao billishyahao force-pushed the alter_dense_layout branch 7 times, most recently from 29dd2bd to 9c6a910 Compare July 19, 2022 07:12
@billishyahao billishyahao force-pushed the alter_dense_layout branch 2 times, most recently from 86fea87 to a81ca9c Compare August 1, 2022 03:30
@billishyahao
Copy link
Contributor Author

Hi @masahi , Could you shed some light on the ci failure? I could not find a way to reproduce it on local environment. Thanks!

@masahi
Copy link
Member

masahi commented Aug 2, 2022

Sorry I couldn't tell what the error was either, cc @driazati @areusch

@driazati
Copy link
Member

driazati commented Aug 2, 2022

sorry it's unclear from the logs, we really should aggregate common error phrases automatically. If you search through the logs for Fatal Python error: Aborted you can see the failed test (e.g. in https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-11966/runs/10/nodes/384/steps/1167/log/?start=0 it's

[2022-08-01T04:22:42.467Z] tests/python/frontend/pytorch/test_forward.py::test_convert_torch_script_with_input_types free(): invalid pointer
[2022-08-01T04:22:42.467Z] Fatal Python error: Aborted
[2022-08-01T04:22:42.467Z] 
[2022-08-01T04:22:42.467Z] Thread 0x00007fbdec1c2700 (most recent call first):
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 300 in wait
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 552 in wait
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/tqdm/_monitor.py", line 60 in run
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
[2022-08-01T04:22:42.467Z] 
[2022-08-01T04:22:42.467Z] Thread 0x00007fbe37727700 (most recent call first):
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/socket.py", line 212 in accept
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 429 in run_server
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 870 in run
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
[2022-08-01T04:22:42.467Z] 
[2022-08-01T04:22:42.467Z] Current thread 0x00007fbe6c265740 (most recent call first):
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/torch/jit/_serialization.py", line 162 in load
[2022-08-01T04:22:42.467Z]   File "/workspace/tests/python/frontend/pytorch/test_forward.py", line 4077 in test_convert_torch_script_with_input_types
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 1761 in runtest
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 166 in pytest_runtest_call
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in <lambda>
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 338 in from_call
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in call_runtest_hook
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 219 in call_and_report
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 130 in runtestprotocol
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 497 in pytest_runtest_protocol
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 347 in pytest_runtestloop
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 322 in _main
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 268 in wrap_session
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 315 in pytest_cmdline_main
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/config/__init__.py", line 165 in main
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/config/__init__.py", line 187 in console_main
[2022-08-01T04:22:42.467Z]   File "/usr/local/lib/python3.7/dist-packages/pytest/__main__.py", line 5 in <module>
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/runpy.py", line 85 in _run_code
[2022-08-01T04:22:42.467Z]   File "/usr/lib/python3.7/runpy.py", line 193 in _run_module_as_main
[2022-08-01T04:22:42.722Z] tests/scripts/setup-pytest-env.sh: line 49: 28733 Aborted                 TVM_FFI=${ffi_type} python3 -m pytest -o "junit_suite_name=${suite_name}" "--junit-xml=${TVM_PYTEST_RESULT_DIR}/${suite_name}.xml" "--junit-prefix=${ffi_type}" "${extra_args[@]}"
[2022-08-01T04:22:42.722Z] + exit_code=134

)

@masahi
Copy link
Member

masahi commented Aug 3, 2022

That surely looks unrelated to this PR (it fails in PyTorch). The same issue is reported in #12276. The error looks similar to the one in #9362, but I don't know why we start getting this now...

@masahi
Copy link
Member

masahi commented Aug 3, 2022

@tvm-bot rerun

@billishyahao
Copy link
Contributor Author

Hi @masahi , Thanks for quick response. I found another testcase failed in https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-11966/runs/11/nodes/382/steps/1159/log/?start=0.
[2022-08-03T01:24:59.821Z] tests/python/frontend/pytorch/qnn_test.py::test_serialized_modules free(): invalid pointer
[2022-08-03T01:24:59.821Z] Fatal Python error: Aborted

Is it a random issue?

@masahi
Copy link
Member

masahi commented Aug 3, 2022

Yeah that looks also unrelated and flaky. It's strange, I did a PR yesterday and met none of these issues. #12263

@billishyahao
Copy link
Contributor Author

@tvm-bot rerun

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants