merge to self branch #1

jiangjiajun · 2021-08-02T01:49:07Z

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

…8381) After fix a66186b, I saw that it should be necessary to do the same fix for depthwise_conv2d for intel graphics. I saw that we never used the removed code and it is just the same code from cuda/depthwise_conv2d.py. So we can use the cuda implementation when it will be necessary.

* fix type relation for batch_matmul * fix lint

* Fix np.int and np.float usage in the tree. Newer versions of numpy give loads of warnings that suggest that np.int and np.float will be deprecated. CI uses pytest and these warning logs clog memory for testing and make it slower. * Fix formatting

…s GPU (#8387)

* rename _update_target and document its function * make tvm.build return OperatorModule to return multiple outputs * allow retrieving the var names used in TIR repr * add Operator Model Library Format and test * Add pathlib convenience functions to utils.TempDirectory. * fix tests * black format * git-clang-format * pylint fixes * add asf header * change memory map to make more sense, fix tests * address giuseros comments * align GetVarName with future TypedPackedFunc * fix test * clang-format * rev model library format to v4 (bad merge)

Remove warning about macOS support from tutorial

* add stm32l4r5zi_nucleo * add parameter for test qemu * file type check * fix test * change order * revert

* fix weight shape in torch.mm conversion * Revert "fix weight shape in torch.mm conversion" This reverts commit a1a8fd3. * [Torch] remove unused conversion

* [Arith] Inverse affine map * [Arith] Inverse affine map * Update iter_affine_map.h * Update iter_affine_map.h * Update iter_affine_map.py * Topology order visit * doc * fix * address comments * lint * remove print

* Support test aten::flip * Support aten::flip

* rename resize to resize2d * refactor resize_2d * Add resize1d op, normalize attribute names across ops * normalize resize3d to match the API of 1D and 2D * fix lint * fix relay tests from API change * refactor topi tests, docs * fix method naming in framework frontends fix more frontend issues * refactor resize tests to reuse components, add more coordinate tranform modes to tests * add cubic resize reference kernel and tests, add relay tests for resize1d * fix pylint * fix test typo

* [fix] Broken link in apps for wasm-standalone * [fix] Broken link in apps for wasm-standalone * [CI] Manual trigger for CI

Co-authored-by: Jackson Hsieh <chengpi@amazon.com>

In a similar vein to previous pull requests replacing deprecated use of np.bool and np.int from numpy with bool and int. https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

* [ONNX] Wrap 'If' if it has multiple outputs Without this wrapper, an assertion in from_onnx() will fail with the error message showing ""Number of output mismatch" * [ONNX] Test If nodes with multiple output tensors * Fix formatting issues

* Fix AttributeError when TEST_DATA_ROOT_PATH is set Initiate a Path object from TEST_DATA_ROOT_PATH to fix the error: AttributeError: 'str' object has no attribute 'mkdir' * [DOCS] Add docs for Pass Instrument - Add a tutorial about how to use pass instrument. - Add related sections in Pass Infrastructure documents. * Fix ir.rst, the length of separator. * Fix unused local name * Fix linting errors * Fix linting errors * Fix linting errors * Address code-review feedbacks * Fix linting * Fix the order of tutorial. * Add exception handling. Address feedbacks. * Fix CI error -- clearing instruments in global pass_ctx * Clarify section hierachy. * Emphasize to use decorator instead of subclassing * Add a sentence to explain Pass Instrument. Fix typo. * Shrink python docs a little. * Fix tag name. * Address feedbacks.

…" (#8400)

Duplicate the CompileEngine interface. Refactor the graph_runtime_codegen to invoke the new LowerTE pass More changes Things appear to be working Some tracing to get Relay code to flow through too. Disable some assertions as exp. Tweak printing for now Fix a few bugs: (#13) 1. Don't add relay main function to list of lowered TIR functions 2. Don't skip visiting call to relay function in graph runtime codegen Remove debug prints. Start refactoring Split out shared data structures Fix implicit duplicate decl of IsDynamic Clean up handling of name + global prim fn Clean up the code and debug issue introduced by previous hack Clean up the debugging Do C++ lint clean up Update src/relay/backend/graph_executor_codegen.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Clean up handling of external functions Add more error messages More clean up Update src/runtime/graph_executor/graph_executor.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Update src/runtime/graph_executor/graph_executor.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Update src/relay/backend/te_compiler.h Co-authored-by: Haichen Shen <shenhaichen@gmail.com> Update src/relay/backend/te_compiler.h Co-authored-by: Haichen Shen <shenhaichen@gmail.com> Fix CR More CR Format Fix lowering path for C++ Fix tests Remove uncessary change Clean up a few more things CI fix Fix the default context Fix Fix broken test cases Update Fix WIP Clean up storage data structures WIP WIP Fix build errors Remove TVMLower Fix lint Lint again fix black Move UpdateMainWorkspaceSize into te_compiler.cc Fix link errors Formatting Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo> Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI) Testing how functions should be named Lint Change how function metadata is updated Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time Fix return in UpdateMainWorkspaceSize Lint Try to fix UpdateMainWorkspaceSize Fix construction of static memory plan Clean up code while debugging Adding UpdateWorkspaceSize back Add closure + call to UpdateFunctionMetadata (WIP) UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else Add some debugging of UpdateMainWorkspaceSize Starting to move UpdateFunctionMetadata call to use process_fn infra UWhat target should be passed to UpdateFunctionMetadata? UpdateFunctionMetadata is not workinggg Added some comments about UpdateFunctionMetadata for Jared Fix the creation of function metadata Try another stab at cleaning up the information Fix Port StorageInfo and StaticMemoryPlan data structure (#8297) Restoring reshape opt Fix tests Caught a nasty typo from Lily, Map::Set does not mutate Format Disable stupid Google style warning Rebase cleanup Formatting Add docstring for storage info Black Post rebase fix Remove prints Disable assert that doesn't make sense for now Fix lint Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars Work with Lily to fix graph attrs Try to figure out where extra arguments are coming from; fix merge passes the profiling test Clean up Fix profile test Remove debugging Add attributes for BYOC uTVM case Format Dumb typo Another fix for byoc Format Fix last 3 failing tests Format Fix final two test cases Format Fix lint Fix again Fix Fix auto scheduler code Fix issue Address CR comment Format Co-authored-by: Jared Roesch <roeschinc@gmail.com>

When dilation is larger than value 1 in conv2d with NHWC layout, the ordering of indexes when accessing data array in computation of convolution appears to be incorrect. 'data_vec' is defined as lambda n, oho, owo, kh, kw, ic, ohi, owi: But accessed as data_vec[n, oho, owo, kh, kw, ohi, owi, ic] This patch fixes the order of indexes and modifies the test so that it is suitable for running on an AArch64 CPU.

* fix testing * trigger

* [Relay] Add support of conv2d with NHWC for Mali Added template schedule for conv2d NHWC reusing similar strategy as for NCHW layout. The schedule is also added to the corresponding test that can be run to verify correctness. * [Relay] Fix issue from pylint in conv2d for Mali

With either the ci_lint docker image, or the matched version of pylint==2.4.4, I got two lint errors running locally that didn't show up in the CI. Fixing them. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

-Some ops(ex:view) call infer_value when converting a model into Relay IR. -If LLVM is not enabled, it leads to segementation fault. Co-authored-by: kueitang <kueitang@qti.qualcomm.com>

* [Bug] Fix x86 dense schedule extern ops * more * lint

* convert ot python list like expected * test example * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>

* [AutoScheduler] Fix task extraction with TE compiler * fix * test * Update python/tvm/auto_scheduler/relay_integration.py

…da (#8554) * [TOPI][CUDA] minor change on assert statement * [TOPI][CUDA] reformatting

…ests (#8541) * Fix issue in 'vectorize' function for 1D and 3D tensors * Add pooling tests for channel last layouts * Add support for more general layouts in "poolnd" implementation * Reformat with 'black' * Fix lint issues

* This is a good practice to save storage space in the Docker images being created * Also sort pip package lists alphabetically

) * Add basic support for batch matmul transpose * Update * Lint fix & add tf convert support * Update Lint fix * Bug fix for qnn.batch_matmul * Bug fix for tensorflow test * Add grad support for batch_matmul * Lint fix Re-triggle CI Bug fix Re-triggle CI Re-triggle CI Re-triggle CI

…xtFuncs (#8523) * bug fix and add tensorarray with partition pass test case * change test function location and address comments * Update tests/python/relay/test_pass_partition_graph.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * trigger CI Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* handle upcasting case * test upcasting tests for tir * address comaniac comments * formatting * add negative tests * fix failing test now allow other things Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>

* Remove all attr::storage_scope usage * pyformat * fixed VTA tests * Update TIR text printer to print storage_scope on allocate * print storage scope in AllocateNode ReprPrinter * Fixed accidently removed scope tag check * remove unused function Co-authored-by: masa <masa@pop-os.localdomain>

* Bug fix for numpy scalar input in vm * Bug fix * Re-triggle CI * Update * Update UT * Re-triggle CI

* reduce testing time * lint issues were resolved. weights for test are always randomly generated Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>

* hotfix check_grad perf regression: lift compile out of hot loop * hoist interpreter creation out of python closure, fix weird conv2d bug on arm cpu * lint * try one more fix

- Added device_type to the device-queried information. - Sort the vulkan devices by the device_type. Priority is discrete > integrated > virtual > cpu > other. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

* Otherwise, stale pytest-results could appear in builds.

* Fix storage_access not visiting else branch * fix conflict with #8516 in the test * update thread sync test following #8516 update

* add flag * fix and test * format * fix memory memory_align function * fix and address comments * format * fix crt aot test * comments * fix test * trigger * trigger * trigger * trigger * trigger Co-authored-by: Mehrdad Hessar <mhessar@ip-172-31-20-199.us-west-2.compute.internal>

* [Vulkan] Rewrote PointerValueTypeRewrite transform In C-style codegen, pointer types can be freely cast between scalar and vectorized types (e.g. `float16x4* <-> float16*`). In SPIR-V, these are separate types, and no such casting is allowed. This was previously handled by having a special-case for `Ramp(base, stride=1, lanes)` in the codegen. That method didn't cover all possible cases, including Broadcast nodes used as indices. PointerValueTypeRewrite previously re-wrote the AllocateNode and parameter pointer types, but didn't update the Load/Store node. This change tracks which variables can be updated to a vectorized type, and then updates all references to those. This includes removing the `RampNode`, as the vectorization is then included as part of the variable type. * [StorageRewrite] Updates as recommended in review. - Added explicit TODO(Lunderberg) for follow-ups - Pass `checker.info_map_` instead of `checker` to `VectorTypeRewriter` * [Vulkan] Allow for pointer rewrites that change base type. A single memory allocation may have more than one type of data stored within it. This allows the PointerTypeRewrite pass to recognize if a function only uses the pointer as a particular base type. This wasn't an issue in C-based codegen, but is required for Vulkan. Since Vulkan shaders do not permit type-casting, the cast must be done when passing the pointer argument into the shader. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>

Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>

…8571)

* [TOPI][CUDA] Improve the performance of scatter_nd by: 1. Split into 2 kernels, one does the "Init" and another does the "Update". Thus they can have different Grid/Block configurations to better utilize SMs. 2. Use atomic_add instead of direct assignment, which could avoid the race condtion when multiple indices point to the same location of the output tensor. With this moidification, it's safe now to use more CUDA threads to gain more parallelism. * Fix python code format. * FIX: [TOPI][CUDA] Improve the performance of scatter_nd #8479 - Split ScatterND kernel into 2 sub-kernels using ib.new_scope() - Replace ib.for_range() with blockIdx.y - Using atomic_add when mode == "add" - Keep threadIdx.x less than max_threads of GPU * Comment added * Add fallback implementation when "mode=add" meets int64 - Atomic_add from CUDA doesn't support int64 data type - Change "ind{i}" to "ind%d"%i, where names of relay.var could correctly display * Python format * Fix line too long * CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Exchange blockIdx.x and blockIdx.y * check for Vulkan or metal * Fallback to previous algorithm when mode==update * Update python/tvm/topi/cuda/scatter.py Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> * Assign TODO * Swapping then and else block Co-authored-by: wenxizhu <wenxizhu@tencent.com> Co-authored-by: CaptainDuke <captainduke328@gmail.com> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* ccache * ccache Fix formatting Add comment about nvcc Change default to AUTO More progress Add auto as a mode Disable ccache in CI add-cache-to-cmake Fix typo * Fix rebase * flaky test

Hzfengsy and others added 30 commits July 1, 2021 14:46

[TIR][TVMScript] specialize (#8354)

29e958d

[BugFix][Relay] Fix type relation for batch_matmul (#8376)

ab01abc

* fix type relation for batch_matmul * fix lint

fix keras install (#8391)

22204be

Add missing annotation for requires_gpu in test_topi_dense.py Require…

354d996

…s GPU (#8387)

macOS is now supported (#8396)

2e3d617

Remove warning about macOS support from tutorial

[microTVM] Add Nucleo stm32l4r5zi board to zephyr (#8386)

7e3f068

* add stm32l4r5zi_nucleo * add parameter for test qemu * file type check * fix test * change order * revert

[Torch] Remove unused conversion (#8397)

e19e979

* fix weight shape in torch.mm conversion * Revert "fix weight shape in torch.mm conversion" This reverts commit a1a8fd3. * [Torch] remove unused conversion

[Arith] Inverse affine map (#8384)

e32d47e

* [Arith] Inverse affine map * [Arith] Inverse affine map * Update iter_affine_map.h * Update iter_affine_map.h * Update iter_affine_map.py * Topology order visit * doc * fix * address comments * lint * remove print

Actually add Compute Library tests to the Jenkins File (#8394)

a00d211

Support aten::flip (#8398)

d17f753

* Support test aten::flip * Support aten::flip

[Docs] Fix for broken link in apps for wasm-standalone dir (#8045)

6a3d950

* [fix] Broken link in apps for wasm-standalone * [fix] Broken link in apps for wasm-standalone * [CI] Manual trigger for CI

add aten::masked_fill_ in pytorch frontend (#8403)

ec47129

Co-authored-by: Jackson Hsieh <chengpi@amazon.com>

fix storage rewrite index remap (#8338)

6bcad2e

Cleanup more uses of np.bool and np.int. (#8399)

bbfc52c

In a similar vein to previous pull requests replacing deprecated use of np.bool and np.int from numpy with bool and int. https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

[Fix] Update stale relay.Module API in docs/comments (#8411)

bd5cd9f

Revert "Actually add Compute Library tests to the Jenkins File (#8394)…

8fb4cdf

…" (#8400)

[microTVM] Add fixture to zephyr test (#8393)

e3e03df

* fix testing * trigger

[PyLint] Minor updates to pass pylint locally. (#8424)

ee65ab7

With either the ci_lint docker image, or the matched version of pylint==2.4.4, I got two lint errors running locally that didn't show up in the CI. Fixing them. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

[Frontend] Check LLVM enabled/installed (#8414)

53cb8aa

-Some ops(ex:view) call infer_value when converting a model into Relay IR. -If LLVM is not enabled, it leads to segementation fault. Co-authored-by: kueitang <kueitang@qti.qualcomm.com>

[Bug] Fix x86 dense schedule extern ops (#8420)

4b67e9d

* [Bug] Fix x86 dense schedule extern ops * more * lint

[Doc] Fix Relay pattern rewrite (#8425)

e934b7e

AndrewZhaoLuo and others added 29 commits July 29, 2021 15:13

[Coreml] Fix Coreml Input Shape Handling (#8562)

92f9558

* convert ot python list like expected * test example * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>

[AutoScheduler] Fix task extraction with TE compiler (#8560)

74715f4

* [AutoScheduler] Fix task extraction with TE compiler * fix * test * Update python/tvm/auto_scheduler/relay_integration.py

add support for softmax and log_softmax with MIOpen (#8543)

b6b8269

[TOPI][CUDA] minor change on assert statement in conv2d_NCHWc_int8.cu…

a993321

…da (#8554) * [TOPI][CUDA] minor change on assert statement * [TOPI][CUDA] reformatting

[Bugfix] Fix #8536 Get Target When Heterogeneous Execution (#8537)

00ad44e

[Bugfux] wasm32-standalone app repaired (#8563)

df06c58

Disable pip cache when creating Docker images (#8575)

cb395ff

* This is a good practice to save storage space in the Docker images being created * Also sort pip package lists alphabetically

[VM] Bug fix for numpy scalar input in vm (#8553)

88dd31b

* Bug fix for numpy scalar input in vm * Bug fix * Re-triggle CI * Update * Update UT * Re-triggle CI

[Torch] Reduce testing time of LSTM tests (#8583)

97f147c

* reduce testing time * lint issues were resolved. weights for test are always randomly generated Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>

[FIX][CI] hotfix check_grad perf regression (#8581)

8148028

* hotfix check_grad perf regression: lift compile out of hot loop * hoist interpreter creation out of python closure, fix weird conv2d bug on arm cpu * lint * try one more fix

[Vulkan] Prioritize discrete GPUs as device_id=0. (#8588)

df96cba

- Added device_type to the device-queried information. - Sort the vulkan devices by the device_type. Priority is discrete > integrated > virtual > cpu > other. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

speed up reference resize kernel (#8592)

22c7d61

Delete pytest-results as part of CI workspace preparation. (#8594)

ff71773

* Otherwise, stale pytest-results could appear in builds.

use sizevar when convert any to tir (#8555)

b7df355

Fix storage_access not visiting else branch (#8525)

12e805f

* Fix storage_access not visiting else branch * fix conflict with #8516 in the test * update thread sync test following #8516 update

[VTA] Recover rpc server support (#8604)

7d8a774

[Refactor] Unify the shared pass prefix between vm and graph (#8526)

28de742

[CUDA] Support multiple TIR-level dynamic shared memory allocations (#…

4b67dac

…8571)

[BUILD] Add caching to CMake (#8373)

9f29e2a

* ccache * ccache Fix formatting Add comment about nvcc Change default to AUTO More progress Add auto as a mode Disable ccache in CI add-cache-to-cmake Fix typo * Fix rebase * flaky test

jiangjiajun merged commit 28bb1de into jiangjiajun:paddlepaddle Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge to self branch #1

merge to self branch #1

jiangjiajun commented Aug 2, 2021

merge to self branch #1

merge to self branch #1

Conversation

jiangjiajun commented Aug 2, 2021