forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge to self branch #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix type relation for batch_matmul * fix lint
* Fix np.int and np.float usage in the tree. Newer versions of numpy give loads of warnings that suggest that np.int and np.float will be deprecated. CI uses pytest and these warning logs clog memory for testing and make it slower. * Fix formatting
* rename _update_target and document its function * make tvm.build return OperatorModule to return multiple outputs * allow retrieving the var names used in TIR repr * add Operator Model Library Format and test * Add pathlib convenience functions to utils.TempDirectory. * fix tests * black format * git-clang-format * pylint fixes * add asf header * change memory map to make more sense, fix tests * address giuseros comments * align GetVarName with future TypedPackedFunc * fix test * clang-format * rev model library format to v4 (bad merge)
Remove warning about macOS support from tutorial
* add stm32l4r5zi_nucleo * add parameter for test qemu * file type check * fix test * change order * revert
* fix weight shape in torch.mm conversion * Revert "fix weight shape in torch.mm conversion" This reverts commit a1a8fd3. * [Torch] remove unused conversion
* [Arith] Inverse affine map * [Arith] Inverse affine map * Update iter_affine_map.h * Update iter_affine_map.h * Update iter_affine_map.py * Topology order visit * doc * fix * address comments * lint * remove print
* Support test aten::flip * Support aten::flip
* rename resize to resize2d * refactor resize_2d * Add resize1d op, normalize attribute names across ops * normalize resize3d to match the API of 1D and 2D * fix lint * fix relay tests from API change * refactor topi tests, docs * fix method naming in framework frontends fix more frontend issues * refactor resize tests to reuse components, add more coordinate tranform modes to tests * add cubic resize reference kernel and tests, add relay tests for resize1d * fix pylint * fix test typo
* [fix] Broken link in apps for wasm-standalone * [fix] Broken link in apps for wasm-standalone * [CI] Manual trigger for CI
Co-authored-by: Jackson Hsieh <chengpi@amazon.com>
In a similar vein to previous pull requests replacing deprecated use of np.bool and np.int from numpy with bool and int. https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
* [ONNX] Wrap 'If' if it has multiple outputs Without this wrapper, an assertion in from_onnx() will fail with the error message showing ""Number of output mismatch" * [ONNX] Test If nodes with multiple output tensors * Fix formatting issues
* Fix AttributeError when TEST_DATA_ROOT_PATH is set Initiate a Path object from TEST_DATA_ROOT_PATH to fix the error: AttributeError: 'str' object has no attribute 'mkdir' * [DOCS] Add docs for Pass Instrument - Add a tutorial about how to use pass instrument. - Add related sections in Pass Infrastructure documents. * Fix ir.rst, the length of separator. * Fix unused local name * Fix linting errors * Fix linting errors * Fix linting errors * Address code-review feedbacks * Fix linting * Fix the order of tutorial. * Add exception handling. Address feedbacks. * Fix CI error -- clearing instruments in global pass_ctx * Clarify section hierachy. * Emphasize to use decorator instead of subclassing * Add a sentence to explain Pass Instrument. Fix typo. * Shrink python docs a little. * Fix tag name. * Address feedbacks.
Duplicate the CompileEngine interface. Refactor the graph_runtime_codegen to invoke the new LowerTE pass More changes Things appear to be working Some tracing to get Relay code to flow through too. Disable some assertions as exp. Tweak printing for now Fix a few bugs: (#13) 1. Don't add relay main function to list of lowered TIR functions 2. Don't skip visiting call to relay function in graph runtime codegen Remove debug prints. Start refactoring Split out shared data structures Fix implicit duplicate decl of IsDynamic Clean up handling of name + global prim fn Clean up the code and debug issue introduced by previous hack Clean up the debugging Do C++ lint clean up Update src/relay/backend/graph_executor_codegen.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Clean up handling of external functions Add more error messages More clean up Update src/runtime/graph_executor/graph_executor.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Update src/runtime/graph_executor/graph_executor.cc Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Update src/relay/backend/te_compiler.h Co-authored-by: Haichen Shen <shenhaichen@gmail.com> Update src/relay/backend/te_compiler.h Co-authored-by: Haichen Shen <shenhaichen@gmail.com> Fix CR More CR Format Fix lowering path for C++ Fix tests Remove uncessary change Clean up a few more things CI fix Fix the default context Fix Fix broken test cases Update Fix WIP Clean up storage data structures WIP WIP Fix build errors Remove TVMLower Fix lint Lint again fix black Move UpdateMainWorkspaceSize into te_compiler.cc Fix link errors Formatting Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo> Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI) Testing how functions should be named Lint Change how function metadata is updated Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time Fix return in UpdateMainWorkspaceSize Lint Try to fix UpdateMainWorkspaceSize Fix construction of static memory plan Clean up code while debugging Adding UpdateWorkspaceSize back Add closure + call to UpdateFunctionMetadata (WIP) UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else Add some debugging of UpdateMainWorkspaceSize Starting to move UpdateFunctionMetadata call to use process_fn infra UWhat target should be passed to UpdateFunctionMetadata? UpdateFunctionMetadata is not workinggg Added some comments about UpdateFunctionMetadata for Jared Fix the creation of function metadata Try another stab at cleaning up the information Fix Port StorageInfo and StaticMemoryPlan data structure (#8297) Restoring reshape opt Fix tests Caught a nasty typo from Lily, Map::Set does not mutate Format Disable stupid Google style warning Rebase cleanup Formatting Add docstring for storage info Black Post rebase fix Remove prints Disable assert that doesn't make sense for now Fix lint Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars Work with Lily to fix graph attrs Try to figure out where extra arguments are coming from; fix merge passes the profiling test Clean up Fix profile test Remove debugging Add attributes for BYOC uTVM case Format Dumb typo Another fix for byoc Format Fix last 3 failing tests Format Fix final two test cases Format Fix lint Fix again Fix Fix auto scheduler code Fix issue Address CR comment Format Co-authored-by: Jared Roesch <roeschinc@gmail.com>
When dilation is larger than value 1 in conv2d with NHWC layout, the ordering of indexes when accessing data array in computation of convolution appears to be incorrect. 'data_vec' is defined as lambda n, oho, owo, kh, kw, ic, ohi, owi: But accessed as data_vec[n, oho, owo, kh, kw, ohi, owi, ic] This patch fixes the order of indexes and modifies the test so that it is suitable for running on an AArch64 CPU.
* fix testing * trigger
* [Relay] Add support of conv2d with NHWC for Mali Added template schedule for conv2d NHWC reusing similar strategy as for NCHW layout. The schedule is also added to the corresponding test that can be run to verify correctness. * [Relay] Fix issue from pylint in conv2d for Mali
With either the ci_lint docker image, or the matched version of pylint==2.4.4, I got two lint errors running locally that didn't show up in the CI. Fixing them. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
-Some ops(ex:view) call infer_value when converting a model into Relay IR. -If LLVM is not enabled, it leads to segementation fault. Co-authored-by: kueitang <kueitang@qti.qualcomm.com>
* [Bug] Fix x86 dense schedule extern ops * more * lint
* convert ot python list like expected * test example * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
* [AutoScheduler] Fix task extraction with TE compiler * fix * test * Update python/tvm/auto_scheduler/relay_integration.py
…da (#8554) * [TOPI][CUDA] minor change on assert statement * [TOPI][CUDA] reformatting
…ests (#8541) * Fix issue in 'vectorize' function for 1D and 3D tensors * Add pooling tests for channel last layouts * Add support for more general layouts in "poolnd" implementation * Reformat with 'black' * Fix lint issues
* This is a good practice to save storage space in the Docker images being created * Also sort pip package lists alphabetically
…xtFuncs (#8523) * bug fix and add tensorarray with partition pass test case * change test function location and address comments * Update tests/python/relay/test_pass_partition_graph.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * trigger CI Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* handle upcasting case * test upcasting tests for tir * address comaniac comments * formatting * add negative tests * fix failing test now allow other things Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
* Remove all attr::storage_scope usage * pyformat * fixed VTA tests * Update TIR text printer to print storage_scope on allocate * print storage scope in AllocateNode ReprPrinter * Fixed accidently removed scope tag check * remove unused function Co-authored-by: masa <masa@pop-os.localdomain>
* Bug fix for numpy scalar input in vm * Bug fix * Re-triggle CI * Update * Update UT * Re-triggle CI
* reduce testing time * lint issues were resolved. weights for test are always randomly generated Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* hotfix check_grad perf regression: lift compile out of hot loop * hoist interpreter creation out of python closure, fix weird conv2d bug on arm cpu * lint * try one more fix
- Added device_type to the device-queried information. - Sort the vulkan devices by the device_type. Priority is discrete > integrated > virtual > cpu > other. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* Otherwise, stale pytest-results could appear in builds.
* add flag * fix and test * format * fix memory memory_align function * fix and address comments * format * fix crt aot test * comments * fix test * trigger * trigger * trigger * trigger * trigger Co-authored-by: Mehrdad Hessar <mhessar@ip-172-31-20-199.us-west-2.compute.internal>
* [Vulkan] Rewrote PointerValueTypeRewrite transform In C-style codegen, pointer types can be freely cast between scalar and vectorized types (e.g. `float16x4* <-> float16*`). In SPIR-V, these are separate types, and no such casting is allowed. This was previously handled by having a special-case for `Ramp(base, stride=1, lanes)` in the codegen. That method didn't cover all possible cases, including Broadcast nodes used as indices. PointerValueTypeRewrite previously re-wrote the AllocateNode and parameter pointer types, but didn't update the Load/Store node. This change tracks which variables can be updated to a vectorized type, and then updates all references to those. This includes removing the `RampNode`, as the vectorization is then included as part of the variable type. * [StorageRewrite] Updates as recommended in review. - Added explicit TODO(Lunderberg) for follow-ups - Pass `checker.info_map_` instead of `checker` to `VectorTypeRewriter` * [Vulkan] Allow for pointer rewrites that change base type. A single memory allocation may have more than one type of data stored within it. This allows the PointerTypeRewrite pass to recognize if a function only uses the pointer as a particular base type. This wasn't an issue in C-based codegen, but is required for Vulkan. Since Vulkan shaders do not permit type-casting, the cast must be done when passing the pointer argument into the shader. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
* [TOPI][CUDA] Improve the performance of scatter_nd by: 1. Split into 2 kernels, one does the "Init" and another does the "Update". Thus they can have different Grid/Block configurations to better utilize SMs. 2. Use atomic_add instead of direct assignment, which could avoid the race condtion when multiple indices point to the same location of the output tensor. With this moidification, it's safe now to use more CUDA threads to gain more parallelism. * Fix python code format. * FIX: [TOPI][CUDA] Improve the performance of scatter_nd #8479 - Split ScatterND kernel into 2 sub-kernels using ib.new_scope() - Replace ib.for_range() with blockIdx.y - Using atomic_add when mode == "add" - Keep threadIdx.x less than max_threads of GPU * Comment added * Add fallback implementation when "mode=add" meets int64 - Atomic_add from CUDA doesn't support int64 data type - Change "ind{i}" to "ind%d"%i, where names of relay.var could correctly display * Python format * Fix line too long * CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Empty, for CI pass * Exchange blockIdx.x and blockIdx.y * check for Vulkan or metal * Fallback to previous algorithm when mode==update * Update python/tvm/topi/cuda/scatter.py Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> * Assign TODO * Swapping then and else block Co-authored-by: wenxizhu <wenxizhu@tencent.com> Co-authored-by: CaptainDuke <captainduke328@gmail.com> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
* ccache * ccache Fix formatting Add comment about nvcc Change default to AUTO More progress Add auto as a mode Disable ccache in CI add-cache-to-cmake Fix typo * Fix rebase * flaky test
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.