Torchdynamo tuning script #9

yelite · 2022-09-23T17:30:41Z

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

…#12481) * trace.cc * add tests * remove assert * add proper test * lint * lint

…stores are not generated at LLVM level. This is a workaround for an instruction selection issue in current version of llvm for hexagon (apache#12471)

)

* [TVMScript] IRBuilder, IRBuilderFrame base class This PR introduces basic data structures of the generic IRBuilder across the codebase. IRBuilder is a general-purpose IRBuilder that can be used in TIR, Relax and any other vendor-specific dialects; IRBuilderFrame is where contexual information as stored in the IRBuilder. * fix linter * Update include/tvm/script/ir_builder/base.h Co-authored-by: Junru Shao <junrushao1994@gmail.com>

* Auto-vectorization (fp16) for v68 * use tvm.testing.main in fp16 test of tanh_slice op

* add bfloat16 promotion for CallNode * add softmax to bfloat16 build test

Previously `CMSISNNFlags` was derived using logic specific to the external code generator, this converts the external code generator options into a `Target`.

…che#12474) * [Target] Only append default keys if target doesn't have any yet This allows target parsers to provide their own target keys. Without this change, the default keys would always be appended, which may or may not be desirable. * Add "cpu" to ARM CPU keys * Add "cpu" to the keys in the mprofile target parser * Restore the mprofile cpptest, since the "cpu" key is back * So the -device attribute is actually needed...

To figure out a user's association with the repo this code before searched the associations in the repo filtered by the relevant username. GitHub doesn't return the exact match only though, so we have to instead collect many results and search through all of them. Co-authored-by: driazati <driazati@users.noreply.github.com>

…2484)

* add config space * lint * lint

* fix scatterND large shape problem * fix thread pool alloca * add scatternd unit test * update with comment * Empty Co-authored-by: wrongtest <wrongtest0@gmail.com>

Fix some typos in src/. Co-authored-by: driazati <driazati@users.noreply.github.com>

…apache#12497) * [Relay][Layout] FInferCorrectLayout for L2 norm layout change. * [Relay][Layout] Test for L2 norm layout transform. * [Relay][Layout] Re-edit test to add multi-dimensional axis list. * Fix cpplint errors * Use clang-format-10 rules. * replace uint with size_t.

…apache#12516)

Following apache#12197, this PR introduces `Schedule.show()` which convenience the user experience in the following two aspects: - Python syntax highlighting - Outputs a schedule function instead of standalone instructions so that it's easier to follow. To demonstrate this change: - Before `Schedule.show()` is introduced: <img width="555" alt="image" src="https://user-images.githubusercontent.com/22515877/185713487-03722566-1df7-45c7-a034-c1460d399681.png"> - After this change: <img width="583" alt="image" src="https://user-images.githubusercontent.com/22515877/185713564-c54f3a9d-cd52-4709-a8b8-d8a61361e611.png">

This PR migrates the existing MemoryDatabase, which is implemented in python at the moment, to C++. The original intent of having an in-memory database that does not persist on disk is merely for testing, but as times go on, we found it useful in production workflow, and thus decided to migrate it C++ for potentially better performance.

This PR: - Adds an entry point for the TVMScript Unified Printer - Adds a helper object class `RootNodeContainer` to provide an injection point for the actual printer implementation to add specialized logic on the root node to print. Tracking issue: apache#11912

) This PR adds boolean operators to OperationDoc. This is needed by the TIR expression printing because it has `tir::And` and `tir::Or`. Tracking issue: apache#11912

…e#12347) Removes support for driver stack versions older than 22.05 (semantic 3.0.1). Additionally, changes the integration to make version checks using semantic versioning rather than the previous year.month versioning method.

…#12489) * [TIR] Support AllocConstantNode in CreatePrimFunc * Handle AllocConstantNode in LeafBlockRemovalPlan * Properly handle AllocConstNode in BufferAllocationLocator * handle AllocateConst in EstimateFlops * remove NDArray printing * doc update * add test * cpplint * Removed dependency on link-params attribute from target * Restored NDArray printing to unbreak test

This PR introduces remaining IRBuilder methods for `Axis`. Co-authored-by: yongwww <yongcale@gmail.com>

These were broken due to this missing guard: https://ci.tlcpack.ai/job/docker-images-ci/job/docker-image-run-tests/223/console Co-authored-by: driazati <driazati@users.noreply.github.com>

…ion (apache#12811) Fix random state fork in TuneContext Clone function.

Recently virtual environments were introduced in the docker images which was a great contribution to localize errors: apache#12663. In this fix, link to the caffe is created inside this virtual env instead of adding it to the system path of python. This fix also removes importing request package where not needed. Fixes apache#12663

apache#12783) [Hexagon] Reduce the number of tests run for VTCM testing in order to speedup CI.

…apache#12807) * Protect access to global buffer manager map * Fix lint

This was missing a repo checkout and failing as in https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/4302/pipeline. This also adds in the changes from apache#12719: Fixes apache#12600. The original solution there doesn't actually fix the issue, there would need to be some job queue that could make sure to reject old pushes. Since this case is pretty rare, generally the next commit that comes along and builds will fix everything up so we can ignore failures that happen on `push`es.

This would post the comment that the tests bot and the docs comment bot uses straightaway when a PR is posted. This will contain links to generic info about posting PRs (and obviate the `.github/PULL_REQUEST_TEMPLATE.md`) as well as dynamic info about the specific PR (filled in later by the respective bots). This would make things like the auto-cc bot more transparent since it would have a link to the relevant issue. Tested live here: driazati#21 (comment)

…ache#12778) * [Testing] Add decorator tvm.testing.requires_cuda_compute_version Previously, individual unit tests would call `tvm.contrib.nvcc.get_target_compute_version` and return early. This was repeated boilerplate in many tests, and incorrectly reported a test as `PASSED` if the required infrastructure wasn't present. This commit introduces `tvm.testing.requires_cuda_compute_version`, a decorator that checks the CUDA compute version and applies `pytest.mark.skipif`. If required infrastructure isn't present, a test will be reported as `SKIPPED`. * requires_cuda_compute_version skips test when no GPU is present

* add debug option to hexagon pytest * address comment

* First pass at improving runtime resource management * Add unit test * Fix lint and clang format errors * Disable resource reset for simulator * Moved acquire/release calls to session object, separate buffer managers for non-runtime (static) and runtime (dynamic). * Fix lint errors * Fix lint errors * Improve robustness of session shutdown * Fix lint * Address feedback * Only allow call to Acquire in a clean state * Use a pointer to indicate the "active" manager

This PR introduces remaining IRBuilder methods for `Block`. Co-authored-by: yongwww <yongcale@gmail.com>

…e#12827) This PR introduces two reducers to TIR reduction part, so that rfactor and cross-thread reduction can be applied to those functions who contains argmax/argmin computation generated by TOPI.

Computing the inverse mapping requires arithmetic analysis which is not guaranteed to cover all cases. We provide the pre-defined inverse index map instead.

Prior to this PR, the LCA detector of buffers in TIR didn't take buffer memory scopes and GPU hierarchy into consideration. An consequent issue is that, when an intermediate buffer is in global memory, TIR's lowering passes don't necessarily allocated the intermediate buffer outside all `blockIdx`. As a result, the global intermediate buffer is allocated under a GPU thread block, which is illegal. This PR fixes this issue by fixing the LCA detector, making it be aware of the buffer memory scopes and GPU hierarchy. With this fix, the global intermediate buffers are all allocated outside `blockIdx`.

) This PR is split from apache#12492, to make the necessary updates to the printer infra for future PRs of TIR printer. Tracking issue: apache#11912 Co-authored-by: Greg Bonik <gbonik@octoml.ai>

…e#12825) This PR relaxes the conditions of Meta-Schedule schedule rule CrossThreadReduction. The rules are previously a bit over-strict, and some workloads with small reduction loop length are unable to be optimized by cross-thread reduction automatically. In this PR, we relax the rules so that such workloads can be optimized.

This PR introduces IRBuilder methods for `Assert`, `Let`, `Realize`, `Evaluate`, `LaunchThread`, `EnvThread`. Co-authored-by: yongwww <yongcale@gmail.com>

This PR introduces IRBuilder methods for `allocate`, `Let`, `allocate_const`, `attr`, `While`, `If/Then/Else`, `decl_buffer`, `buffer_store`, `prefetch`. Co-authored-by: yongwww <yongcale@gmail.com>

…trs["force_suppress"] (apache#12593) * [Frontend][TFLite]fix detection_postprocess's non_max_suppression_attrs["force_suppress"] Since tvm only supports operators detection_postprocess use_regular_nms is false, which will suppress boxes that exceed the threshold regardless of the class when implementing NMS in tflite, in order for the results of tvm and tflite to be consistent, we need to set force_suppress to True. * [Frontend][TFLite]fix detection_postprocess's non_max_suppression_attrs[force_suppress] Added a test case that reproduces inconsistent results between tvm and tflite When the force_suppress is false,it will get a good result if you set the force_suppress as true

Implementation of API in `tvm.tir.schedule` for layout transformations with padding, as part of apache#12261, item "Insert pad value into generated TIR, using `tir::if_then_else`, `builtin::assume`, and `builtin::undef`". Following the RFC discussion in apache/tvm-rfcs#77 (comment) and apache/tvm-rfcs#77 (comment), this commit preferentially rewrites the loops that surround a padded transformation where possible, in order to express padding in terms of `tir::if_then_else`.

* Relax pretty printer initial prototype * call into TVMScriptPrinter for PrimFuncs * most round-trip tests pass * address comments * implement relax.output syntax for dataflow block outputs * remove leftover comments * fix Var constructor on ShapeExpr annotation * fix DataflowVar as well

* Introduce match cast, and code changes along * add match_cast parser support (#9) * Match cast support for VMShapeLower CanonicalizeBinding * Remove `match_shape` (#12) * Refactor ExprVisitor/Mutator to consider Expr in StructInfo. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

AndrewZhaoLuo and others added 30 commits August 17, 2022 21:00

[MetaSchedule] Handle deserializing empty string RVs in trace (apache…

fb07351

…#12481) * trace.cc * add tests * remove assert * add proper test * lint * lint

[HEXAGON][TOPI] This PR adjusts schedules so >64 length vector loads/…

436c17f

…stores are not generated at LLVM level. This is a workaround for an instruction selection issue in current version of llvm for hexagon (apache#12471)

[COMMUNITY] Adam Straw -> Reviewer (apache#12480)

e140a27

[TIR] Disallow vectorization with strides in VerifyGPUCode (apache#12477

aa97f4a

)

Fix memset of memory pool in PageMemoryManagerCreate (apache#12437)

da7675c

Add RISC-V build/test pipeline to Jenkins. (apache#12441)

a96bda4

[HEXAGON] Auto-vectorization (fp16) for v68 (apache#12397)

88928a4

* Auto-vectorization (fp16) for v68 * use tvm.testing.main in fp16 test of tanh_slice op

[TIR] [bfloat16] add bfloat16 promotion for CallNode (apache#12370)

efd7c45

* add bfloat16 promotion for CallNode * add softmax to bfloat16 build test

[CMSIS-NN] Re-use CPU Target Parser (apache#12320)

d1e6f39

Previously `CMSISNNFlags` was derived using logic specific to the external code generator, this converts the external code generator options into a `Target`.

[COMMUNITY] MichaelJKlaiber -> reviewer (apache#12501)

5d17e24

Fix test_autotune to support schedules with no tuning space (apache#1…

c0d440d

…2484)

[microTVM] Add config space to dense_dsp schedule (apache#12444)

8b3401c

* add config space * lint * lint

[TOPI]fix scatterND large shape problem (apache#12200)

41be1b4

* fix scatterND large shape problem * fix thread pool alloca * add scatternd unit test * update with comment * Empty Co-authored-by: wrongtest <wrongtest0@gmail.com>

fix group_conv3d caculate error (apache#12500)

9d6039b

[Fix] Fix some typos (apache#11503)

bdcfa01

Fix some typos in src/. Co-authored-by: driazati <driazati@users.noreply.github.com>

fix pytest (apache#12483)

c83ee08

fix temp array object reference in manifest_shared_memory_local_stage (…

eb31123

…apache#12516)

[MetaSchedule] Implement ScheduleFn as a C++ class (apache#12513)

125c9ca

[COMMUNITY] An Wang -> Reviewer (apache#12517)

92355f2

[TVMScript] Printer: add boolean operators to OperationDoc (apache#12518

2629065

) This PR adds boolean operators to OperationDoc. This is needed by the TIR expression printing because it has `tir::And` and `tir::Or`. Tracking issue: apache#11912

fix group conv3d pack kernel shape error (apache#12523)

e9aad35

cyx-6 and others added 27 commits September 15, 2022 23:29

[TVMScript] IRBuilder methods for Axis (apache#12808)

c0d2734

This PR introduces remaining IRBuilder methods for `Axis`. Co-authored-by: yongwww <yongcale@gmail.com>

[ci][docker] Fix nightly Docker tests (apache#12804)

9b17f34

These were broken due to this missing guard: https://ci.tlcpack.ai/job/docker-images-ci/job/docker-image-run-tests/223/console Co-authored-by: driazati <driazati@users.noreply.github.com>

[MetaSchedule][Minor]Fix Random State Fork in TuneContext Clone Funct…

6b3be49

…ion (apache#12811) Fix random state fork in TuneContext Clone function.

[Hexagon] Reduce the number of tests run for VTCM testing in order to… (

43d9a3b

apache#12783) [Hexagon] Reduce the number of tests run for VTCM testing in order to speedup CI.

[Hexagon] [runtime] Protect access to global HexagonBufferManager map (…

7c96e25

…apache#12807) * Protect access to global buffer manager map * Fix lint

[Hexagon] Add debug option to hexagon pytest (apache#12795)

bb80f19

* add debug option to hexagon pytest * address comment

[TVMScript] IRBuilder methods for Block (apache#12815)

41b65a3

This PR introduces remaining IRBuilder methods for `Block`. Co-authored-by: yongwww <yongcale@gmail.com>

[TIR] Support pattern matching argmax/argmin generated by TOPI (apach…

2cae905

…e#12827) This PR introduces two reducers to TIR reduction part, so that rfactor and cross-thread reduction can be applied to those functions who contains argmax/argmin computation generated by TOPI.

[TIR] Construct the inverse in SuggestIndexMap (apache#12797)

91cce56

Computing the inverse mapping requires arithmetic analysis which is not guaranteed to cover all cases. We provide the pre-defined inverse index map instead.

[TVMScript] Add more helper functions to the printer infra (apache#12829

1ecf084

) This PR is split from apache#12492, to make the necessary updates to the printer infra for future PRs of TIR printer. Tracking issue: apache#11912 Co-authored-by: Greg Bonik <gbonik@octoml.ai>

[TVMScript] IRBuilder methods for Stmt (apache#12830)

b2c5add

This PR introduces IRBuilder methods for `Assert`, `Let`, `Realize`, `Evaluate`, `LaunchThread`, `EnvThread`. Co-authored-by: yongwww <yongcale@gmail.com>

[TVMScript] IRBuilder methods for Stmt (apache#12831)

052e702

This PR introduces IRBuilder methods for `allocate`, `Let`, `allocate_const`, `attr`, `While`, `If/Then/Else`, `decl_buffer`, `buffer_store`, `prefetch`. Co-authored-by: yongwww <yongcale@gmail.com>

Add a script to run torchbench with TVM

f417555

Add missing args

0123e1a

Add todo

be0a16c

Move to the metaschedule folder

23a6658

Finish the local version of benchmark runner

058b8ee

Add rpc config and some documentation

03d630f

yelite closed this Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchdynamo tuning script #9

Torchdynamo tuning script #9

yelite commented Sep 23, 2022

Torchdynamo tuning script #9

Torchdynamo tuning script #9

Conversation

yelite commented Sep 23, 2022