Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchdynamo tuning script #9

Closed
wants to merge 698 commits into from
Closed

Torchdynamo tuning script #9

wants to merge 698 commits into from

Conversation

yelite
Copy link
Owner

@yelite yelite commented Sep 23, 2022

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

AndrewZhaoLuo and others added 30 commits August 17, 2022 21:00
…#12481)

* trace.cc

* add tests

* remove assert

* add proper test

* lint

* lint
…stores are not generated at LLVM level. This is a workaround for an instruction selection issue in current version of llvm for hexagon (apache#12471)
* [TVMScript] IRBuilder, IRBuilderFrame base class

This PR introduces basic data structures of the generic IRBuilder
across the codebase.

IRBuilder is a general-purpose IRBuilder that can be used in TIR, Relax
and any other vendor-specific dialects; IRBuilderFrame is where contexual
information as stored in the IRBuilder.

* fix linter

* Update include/tvm/script/ir_builder/base.h

Co-authored-by: Junru Shao <junrushao1994@gmail.com>
* Auto-vectorization (fp16) for v68

* use tvm.testing.main in fp16 test of tanh_slice op
* add bfloat16 promotion for CallNode

* add softmax to bfloat16 build test
Previously `CMSISNNFlags` was derived using logic specific to the external code generator, this converts the external code generator options into a `Target`.
…che#12474)

* [Target] Only append default keys if target doesn't have any yet

This allows target parsers to provide their own target keys. Without this
change, the default keys would always be appended, which may or may not
be desirable.

* Add "cpu" to ARM CPU keys

* Add "cpu" to the keys in the mprofile target parser

* Restore the mprofile cpptest, since the "cpu" key is back

* So the -device attribute is actually needed...
To figure out a user's association with the repo this code before
searched the associations in the repo filtered by the relevant username.
GitHub doesn't return the exact match only though, so we have to instead
collect many results and search through all of them.

Co-authored-by: driazati <driazati@users.noreply.github.com>
* fix scatterND large shape problem

* fix thread pool alloca

* add scatternd unit test

* update with comment

* Empty

Co-authored-by: wrongtest <wrongtest0@gmail.com>
Fix some typos in src/.

Co-authored-by: driazati <driazati@users.noreply.github.com>
…apache#12497)

* [Relay][Layout] FInferCorrectLayout for L2 norm layout change.

* [Relay][Layout] Test for L2 norm layout transform.

* [Relay][Layout] Re-edit test to add multi-dimensional axis list.

* Fix cpplint errors

* Use clang-format-10 rules.

* replace uint with size_t.
Following apache#12197, this PR introduces
`Schedule.show()` which convenience the user experience in the following
two aspects:
- Python syntax highlighting
- Outputs a schedule function instead of standalone instructions so that
it's easier to follow.

To demonstrate this change:
- Before `Schedule.show()` is introduced:
<img width="555" alt="image" src="https://user-images.githubusercontent.com/22515877/185713487-03722566-1df7-45c7-a034-c1460d399681.png">

- After this change:
<img width="583" alt="image" src="https://user-images.githubusercontent.com/22515877/185713564-c54f3a9d-cd52-4709-a8b8-d8a61361e611.png">
This PR migrates the existing MemoryDatabase, which is implemented in
python at the moment, to C++. The original intent of having an in-memory
database that does not persist on disk is merely for testing, but as
times go on, we found it useful in production workflow, and thus decided
to migrate it C++ for potentially better performance.
This PR:

- Adds an entry point for the TVMScript Unified Printer
- Adds a helper object class `RootNodeContainer` to provide an injection point for the actual printer implementation to add specialized logic on the root node to print.

Tracking issue: apache#11912
)

This PR adds boolean operators to OperationDoc. This is needed by the TIR expression printing because it has `tir::And` and `tir::Or`.

Tracking issue: apache#11912
…e#12347)

Removes support for driver stack versions older than 22.05
(semantic 3.0.1). Additionally, changes the integration to make
version checks using semantic versioning rather than the previous
year.month versioning method.
…#12489)

* [TIR] Support AllocConstantNode in CreatePrimFunc

* Handle AllocConstantNode in LeafBlockRemovalPlan

* Properly handle AllocConstNode in BufferAllocationLocator

* handle AllocateConst in EstimateFlops

* remove NDArray printing

* doc update

* add test

* cpplint

* Removed dependency on link-params attribute from target

* Restored NDArray printing to unbreak test
cyx-6 and others added 27 commits September 15, 2022 23:29
This PR introduces remaining IRBuilder methods for `Axis`.

Co-authored-by: yongwww <yongcale@gmail.com>
These were broken due to this missing guard:
https://ci.tlcpack.ai/job/docker-images-ci/job/docker-image-run-tests/223/console

Co-authored-by: driazati <driazati@users.noreply.github.com>
…ion (apache#12811)

Fix random state fork in TuneContext Clone function.
Recently virtual environments were introduced in the
docker images which was a great contribution to
localize errors: apache#12663. In this fix, link to the caffe is
created inside this virtual env instead of adding it
to the system path of python. This fix also removes
importing request package where not needed.

Fixes apache#12663
apache#12783)

[Hexagon] Reduce the number of tests run for VTCM testing in order to speedup CI.
…apache#12807)

* Protect access to global buffer manager map

* Fix lint
This was missing a repo checkout and failing as in
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/4302/pipeline.
This also adds in the changes from apache#12719:

Fixes apache#12600. The original solution there doesn't actually fix the
issue, there would need to be some job queue that could make sure to
reject old pushes. Since this case is pretty rare, generally the next
commit that comes along and builds will fix everything up so we can
ignore failures that happen on `push`es.
This would post the comment that the tests bot and the docs comment bot
uses straightaway when a PR is posted. This will contain links to
generic info about posting PRs (and obviate the
`.github/PULL_REQUEST_TEMPLATE.md`) as well as dynamic info about the
specific PR (filled in later by the respective bots). This would make
things like the auto-cc bot more transparent since it would have a link
to the relevant issue.

Tested live here: driazati#21 (comment)
…ache#12778)

* [Testing] Add decorator tvm.testing.requires_cuda_compute_version

Previously, individual unit tests would call
`tvm.contrib.nvcc.get_target_compute_version` and return early.  This
was repeated boilerplate in many tests, and incorrectly reported a
test as `PASSED` if the required infrastructure wasn't present.

This commit introduces `tvm.testing.requires_cuda_compute_version`, a
decorator that checks the CUDA compute version and applies
`pytest.mark.skipif`.  If required infrastructure isn't present, a
test will be reported as `SKIPPED`.

* requires_cuda_compute_version skips test when no GPU is present
* add debug option to hexagon pytest

* address comment
* First pass at improving runtime resource management

* Add unit test

* Fix lint and clang format errors

* Disable resource reset for simulator

* Moved acquire/release calls to session object, separate buffer managers for non-runtime (static) and runtime (dynamic).

* Fix lint errors

* Fix lint errors

* Improve robustness of session shutdown

* Fix lint

* Address feedback

* Only allow call to Acquire in a clean state

* Use a pointer to indicate the "active" manager
This PR introduces remaining IRBuilder methods for `Block`.

Co-authored-by: yongwww <yongcale@gmail.com>
…e#12827)

This PR introduces two reducers to TIR reduction part, so that rfactor and cross-thread reduction can be applied to those functions who contains argmax/argmin computation generated by TOPI.
Computing the inverse mapping requires arithmetic analysis which is not guaranteed to cover all cases. We provide the pre-defined inverse index map instead.
Prior to this PR, the LCA detector of buffers in TIR didn't take buffer memory scopes and GPU hierarchy into consideration. An consequent issue is that, when an intermediate buffer is in global memory, TIR's lowering passes don't necessarily allocated the intermediate buffer outside all `blockIdx`. As a result, the global intermediate buffer is allocated under a GPU thread block, which is illegal.

This PR fixes this issue by fixing the LCA detector, making it be aware of the buffer memory scopes and GPU hierarchy. With this fix, the global intermediate buffers are all allocated outside `blockIdx`.
)

This PR is split from apache#12492, to make the necessary updates to the printer infra for future PRs of TIR printer.

Tracking issue: apache#11912

Co-authored-by: Greg Bonik <gbonik@octoml.ai>
…e#12825)

This PR relaxes the conditions of Meta-Schedule schedule rule CrossThreadReduction. The rules are previously a bit over-strict, and some workloads with small reduction loop length are unable to be optimized by cross-thread reduction automatically. In this PR, we relax the rules so that such workloads can be optimized.
This PR introduces  IRBuilder methods for `Assert`, `Let`, `Realize`, `Evaluate`, `LaunchThread`, `EnvThread`.

Co-authored-by: yongwww <yongcale@gmail.com>
This PR introduces  IRBuilder methods for
`allocate`, `Let`, `allocate_const`, `attr`,  `While`, `If/Then/Else`, `decl_buffer`, `buffer_store`, `prefetch`.

Co-authored-by: yongwww <yongcale@gmail.com>
…trs["force_suppress"] (apache#12593)

* [Frontend][TFLite]fix detection_postprocess's non_max_suppression_attrs["force_suppress"]

Since tvm only supports operators detection_postprocess use_regular_nms
is false, which will suppress boxes that exceed the threshold regardless
of the class when implementing NMS in tflite, in order for the results
of tvm and tflite to be consistent, we need to set force_suppress to
True.

* [Frontend][TFLite]fix detection_postprocess's non_max_suppression_attrs[force_suppress]

Added a test case that reproduces inconsistent results between tvm and tflite
When the force_suppress is false,it will get a good result if you set the force_suppress as true
Implementation of API in `tvm.tir.schedule` for layout transformations
with padding, as part of apache#12261,
item "Insert pad value into generated TIR, using `tir::if_then_else`,
`builtin::assume`, and `builtin::undef`".

Following the RFC discussion in
apache/tvm-rfcs#77 (comment) and
apache/tvm-rfcs#77 (comment),
this commit preferentially rewrites the loops that surround a padded
transformation where possible, in order to express padding in terms of
`tir::if_then_else`.
@yelite yelite closed this Sep 23, 2022
yelite pushed a commit that referenced this pull request Feb 17, 2023
* Relax pretty printer initial prototype

* call into TVMScriptPrinter for PrimFuncs

* most round-trip tests pass

* address comments

* implement relax.output syntax for dataflow block outputs

* remove leftover comments

* fix Var constructor on ShapeExpr annotation

* fix DataflowVar as well
yelite pushed a commit that referenced this pull request Feb 17, 2023
* Introduce match cast, and code changes along

* add match_cast parser support (#9)

* Match cast support for VMShapeLower CanonicalizeBinding

* Remove `match_shape` (#12)

* Refactor ExprVisitor/Mutator to consider Expr in StructInfo.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.