Add documentation for Adreno deployment #22

adding Jyotsna to reviewers list

This reverts commit e9e8c4b.

…o reduce tuning time (apache#13259) * [MetaSchedule] Swap the order of RewriteTensorize and VerifyGPUCode to reduce tuning time * add comment

See issue apache#13227. Co-authored-by: driazati <9407960+driazati@users.noreply.github.com>

This commit ensures that constant folding is applied when a desired layout is selected during compilation. It ensures that `layout_transform` operations are removed where possible so that pattern matching for BYOC backends can work effectively. A test has been added to check this regression.

…3252) This commit applies additional write permission to the "tvm-venv" group virtual environment. Currently after entering a container from a newly built image it dosn't seem possible to install/update Python packages. E.g. updating pip will give errors such as: ``` $ pip install --upgrade pip ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/venv/apache-tvm-py3.7/bin/pip' Check the permissions. ``` Enabling write access for this group fixes this as long as the current user is a member of the "tvm-venv" group.

* [Hexagon] Tests pylint * fix error * Fix buffer name

- Fix clang 15.0.3 '-Wunused-but-set-variable' and '-Wunused-lambda-capture' warnings by removing / commenting-out code.

…ing function_def_to_graph_def (apache#13260) [TF2] Import graph_def to default graph before calling function_def_to_graph_def

apache#13247) There are a local variable referenced before assignment in convert_interpolate function. I think varible 'size' is real want to be referenced.

…he#13274) This reverts commit 5acf3f9. Reverting since this is causing some spam from the ASF Infra bot related to https://issues.apache.org/jira/browse/INFRA-23834. As in that issue the protections have been applied manually by ASF Infra so this revert shouldn't have any real effect

Minimal dependencies for Fedora/CentOS This commit indicates how to install minimal set of dependencies for building Apache TVM on Fedora and CentOS. It supplements existing information for Ubuntu and MacOS.

Fix occurrences of clang's `-Wdocumentation-unknown-command` warning.

Fix code to address a valid `-Wredundant-move` clang warning.

* [ETHOSN] Inline non-compute-intensive partitions Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heurisic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimial; optimization is left for future exploration. This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration. Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835 * Address comments * 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute _intensive_partitions_enabled'. * remove no MAC operations. * fix network test. Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579 * Fix failing unit tests due to optimization Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859 * Add future exploration suggestions Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c

* [ETHOSN] Throw error message when inference fails Previously the runtime would silently skip interence failures and return random values as the result. This can make spotting inference failures challenging. The runtime now throws a fatal error when inference did not complete successfully along with an error message that gives some details about the error that occurred. Change-Id: Iadb6da04ad1c906e3ec49959eb3da0978295aebf * Address comments * clarify test file brief * add test case for running status * add driver stack reference to WaitStatus class Change-Id: I792742892b761534904816135ae2ffcb3f028b2c

This PR introduces a new argument for EvolutionarySearch that limits the failures (defined as rounds of no new generated candidate) in the `SampleInitPopulation` stage. In this way we can avoid the task to be hanging forever in special cases, e.g., some postproc always fails. This should fix apache#12330.

…he#13269) Current type checker for TIR schedule had issue with typing for Python 3.9. This simple patch fixes this problem.

…marking (apache#13255) This PR adds features to the `python/tvm/meta_schedule/testing/torchbench/run.py`. - Integrate with the TVM PyTorch integration to handle boolean tensor and unaligned memory. - Deduplicate collected tuning tasks to prevent thousands of tasks created by hundreds of subgraphs with similar structure. - Add option to cast model to float32, which are more stable numerically than float16 and prevents inaccurate result from many models. - Add option to choose search strategy in MetaSchedule. - Inspect output error if the actual output doesn't match the expectation. Also save the actual output and expected output for further analysis if needed. - Save subgraphs and their example input for debug purpose. - Print MetaSchedule profiling information at the end of execution. - Detach PyTorch tensor before exporting to dlpack. - Fix the sys path to avoid conflict with the `benchmarks` package installed by TorchBench dependency. - Trim all command line args passed in, in order to prevent breaking some TorchBench model that depends on args. - Empty cuda cache before starting the actual benchmark.

Add tensor rank check for `nn.instance_norm`.

add(%1, %1) convert to multiply(%1, 2f); enhance fold_scale_axis to fold multiply(%1, 2f) into conv Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com>

[COMMUNITY] New committer Ashutosh Parkhi

…e#13024) Prior to this commit, the result of TryCompare would only be used if it could definitively prove a conditional to be either true or false. For example, if it is known that `0 <= i`, a conditional of `i <= 0` would be left as-is. This commit introduces rewrite rules to preferentially simplify into more restrictive conditions. Using the same example, if it is known that `0 <= i`, a conditional of `i <= 0` would be simplified into `i == 0`. Similarly, if it is known that `0 <= i`, a conditional of `i != 0` would be simplified into `0 < i`. Because this change does not introduce significant overhead, as the results of `RewriteSimplifier::Impl::TryCompare` are already available, this change is enabled for all use cases and does not require a call to `RewriteSimplifier::SetEnabledExtensions`.

Remove unused member variable in the `SimulatorRPCChannel` class. Fixes a clang warning.

…re used nested (apache#13278) The PatternGroup doesn not check if the FunctionPattern is matched while processing the FunctionPattern, but when FunctionPattern is nested with AltPattern, the FunctionPattern may not be matched, resulting in a crash when looking up matched nodes. This commit adds a check at handling FunctionPattern to fix this crash.

- Address a (valid) warning from clang-15.0.3 regarding the `tvm::tir::DataTypeRewriter` class. - Make some class methods `protected` rather than `public` to better reflect authors' intent.

…es easier (apache#13285) * [TIR][Tensorize] Add error logs to IR comparator to display what caused tensorization to fail * lint issues

* Hexagon test lint part 2 * fix import * fix global variable * fix import issue * fix import * fix exception error * address comments

…ble to avoid crash (apache#13297) * make elem_offset of the buffers created by te.extern a variable Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai> * add test * fix te extern create_prim_func test Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

…he#13298)

Preserve loop annotations when the loop is get partitioned. Also we bind the loop region info to the analyzer for some cases some partition condition could not get solved due to unknown (but trivial) loop region.

Unify the two inner dimensions in the type checker so if one is unknown it will be filled in.

Update tvmc tutorial code to use correct argument for reusing tuning records. Specifically, current code uses tuning_records, which is meant for saving the generated tuning results, not reusing prior results. We should use prior_records instead.

When building tvm runtime with hexagon we face the below error if USE_HEXAGON_EXTERNAL_LIBS is not defined. This happens because USE_HEXAGON_EXTERNAL_LIBS=OFF is defined as the default in CMakeLists.txt. The modified condition can check for all cases including undefined variable, empty string and OFF CMake Error at cmake/modules/Hexagon.cmake:203 (message): Invalid use of USE_HEXAGON_EXTERNAL_LIBS=OFF; USE_HEXAGON_EXTERNAL_LIBS only supports absolute paths and git repository urls Call Stack (most recent call first): CMakeLists.txt:477 (include)

Add support for large index fp16 mean and var.

* fix sched_setaffinity error on Android * fix sched_setaffinity error on Android * fix sched_setaffinity error on Android * clang format * add ndk api verion macro * clang format

* [Torch] Fix advanced indexing with boolean mask * add comment

change

…c… (apache#13277) Add test case for interpolate op convert function apache#13247

apache#13311) Currently one version of `tvm::LowerSchedule` doesn't pass along the input `simple_mode` flag, which causes it to default back to `false`. This commit fixes it by passing along the input flag.

Currently, the RPC session on C/C++ side does not know if the session was closed on Python side which causes extra read/write on transport while the session is already closed. This commit reuses the Hexagon approach in microTVM to shutdown the RPC session.

…e#13318) Move lock/unlock to HexagonHtp temporarily

…pache#13314) This PR updates the `src/tir/transforms/thread_storage_sync.cc`, to make it insert storage sync if the access index doesn't depend on the innermost thread index, i.e., being constant wit respect to the innermost thread id. This fixes an accuracy problem on model https://github.com/pytorch/benchmark/tree/main/torchbenchmark/models/timm_efficientdet

* [ETHOSN] Consolidate target string usage Removes support for a deprecated target string. The deprecation warning has been around for a couple of releases now so it should be safe to remove. The target to use moving forward is: `ethos-n -variant=n78 ...` Refactored direct use of a driver stack target string in the testing infrastructure to use the same string we expect users to provide. This simplified some of the code in codegen and hopefully avoids confusion in the future.

* [Adreno][Textures] Fix static memory planner Fix memory reusage in static memory planner. * Move token allocators to separate file * Add test on TokenAllocator2d * Apply comments and fix CI

Add clang-format disable for header to prevent reorder. Torch header file need to be put at the end since torch's dlpack is a little different with tvm's. Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com>

Because the majority of TIR PrimFuncs operate on buffers, write their outputs to an output parameter, and do not return a value, the `-> None` in the function signature becomes visual noise. This commit removes printing of the return type in cases where the PrimFunc has no return value.

* [OpenCL][unit tests] Fix opencl cpp unit tests After some changes in Hexagon, the run of cpp opencl tests leads to the following error: ``` pluggy.manager.PluginValidationError: unknown hook 'pytest_configure_node' in plugin <module 'tvm.contrib.hexagon.pytest_plugin' ``` Added `pytest_plugin` for OpenCL CPP tests for avoiding this error and processing gtest arguments. * Fix fail than gtest_args option was already added * Move `gtest_args` deginition to the main testing plugin

* Add memory size as project option * cleanup * address comments * address comments

* [TIR] Remove redundant add in vnni intrin * Update arm intrin Co-authored-by: Ubuntu <ubuntu@ubuntu.com>

…#13321)

AOT requires the ExecutorCodegenMetadata object to be populated containing various pieces of information about the compiled module. This commit adds a separate analysis pass to create the metadata + some tests for the new pass. In order to collect the device information correctly, AOTLowerMain is extended to attach the device info as a function attribute.

…pache#13324) This commit adds a tutorial to compile and run a PyTorch model using microTVM, the AOT host-driven executor, and C runtime (CRT).

…3333) Update Jenkins readme to match new directory structure

…ngAnchorTrace` (apache#13329) * index on concat-fusion-fix: 3ffe5b1 fix te extern create_prim_func test * Apply AutoInline to the last block after all other blocks are processed * Do not require CanReverseComputeInline to be true when CanComputeInline is false * add comment * add test * cpplint

* Add validation scripts. * Fix testing script. * Fix lint. * Fix lint. * Fix inputs. * Fix lint. * Fix lint. * Add timer func. * Fix ci. * Address comments. * Add total time statistics. * Fix lint.

…s of input tensors (apache#13322) * QLinearMatMul was extended for all ranks of a and b * CI test for QLinearMatMul was implemented (onnx front-end) * fix after black check * numpy type fix * fix weight scale and zero point, output type * fix after pylint * resolve different input types in tests * skip resolved TODO * update covering of QLinearMatMul by tests * pylint fixes * skip test of QLinearMatMul on CUDA Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>

* [TIR] Disallow reverse inline into a producer with non-trivial predicate * add test * Allow cases where the producer predicate can be implied by the new predicate of the inlined block * remove unused variable * update comment in test to reflect the change in ReverseComputeInline

* [TOPI] Fix conv2d transpose for small channel * black

…#13339) We moved most of the IR definition into the testing methods correspondingly. Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>

* cpp_rpc build failure for Android devices with NDK version < 23 * * Make environment variable ANDROID_NDK_MAJOR optional. Co-authored-by: Siva Rama Krishna Reddy B <sivb@blr-ubuntu-ripper.qualcomm.com>

…3336) Make 'allocate_hexagon_array' a hexagon contrib API

This PR fixes the bug in MergeConstants pass on striped networks on Ethos-U NPU. The issue was caused by _DivideConstants_ pass which is introducing new mod parameters and changing their order. So ethosu_write parameter in some cases is moved from the end of the list to the middle. E.g. from: `[ethos-u_0_i0, p1, p2, p3, p4, p5, p6, ethosu_write]` To: `[ethos-u_0_i0, p1, p2, ethosu_write, placeholder, placeholder, placeholder, placeholder, placeholder, placeholder, placeholder, placeholder]` Updated version of the _GetArgsToMergeWithoutArgsNotInConstDict_ and _MakeNewConstDict_ methods in passes.cc can now correctly modify const_dict according to the new parameter list.

* [TVMC] Global pass context for compile and tune Comes as a followup from conversations in apache#13216. By making the pass context a global value for both `compile` and `tune` commands, we can ensure the pass context is exactly as the user expected and also test components such as `convert_graph_layout` under a pass context suitable for testing (e.g. add instruments). With this change, it becomes the users responsibility to ensure the PassContext they select is suitable for the passes that will be run. By default, `opt_level` remains as 3 so current workflows that do not alter the pass context from the command line / TVMC Python API should not be affected. Change-Id: I7a601daf6fbe664f77bce1b45efeb7ca29f621b3 * fix vitis-ai test and typo Change-Id: I04f5bd031ae4717825f42e373bcb0e1e2c1c9d90

apache#13301) * [TIR] Update ReductionIterNotIndexOutputBuffer to check BlockRealizeNodes match_buffer statements when validating writes * Add test to verify that tensorized blocks are properly validated * update to take into account all match buffer regions. * lint

This PR refactors timezone setup to a separate script that docker/install/ubuntu_install_core.sh Also, it adds a script to install NRF and reused in both cortexm docker and RVM installation path.

Fix denominator checking in `TryConstFold`.

…he#13353) * Fix typo. * Add regression test.

…el workload (apache#13334) * [MetaSchedule] Add a new schedule rule to inline all scalar constants * add doc * reorg * identify constant block by its structure, not by name

…che#13354) This PR introduces a check to prevent records with run time of zero into the training data of cost model. This is because when working on microTVM there're cases where the run time of certain successful runs is very tiny, such that it got recorded as zero. In such cases, the runtime of 0 would break XGBoost model because it introduces infinite running speed in GFLOPs. A regression test was also added.

It seems like there is some inconsistency across the python versions and make PR apache#13269 fails at Python 3.10. This patch fixes this issue. Co-authored-by: Junru Shao <junrushao1994@gmail.com>

…unc is not found (apache#13346)

…e#13356)

…tiLevelTilingTensorCore` (apache#13357) * Fuse shared to global store loops in MultiLevelTilingTensorCore * update test

…etConsumers() (apache#13344) Currently there are two versions of `GetConsumers()` and `GetProducers()` implementation. Make them consistent to avoid possible bug when there are WAR dependencies.

…pache#13343)

As part of effort of more formal TIR semantics, we want to more explicitly differentiate TIR AST nodes (defined in `tir/expr.h`) and TIR ops (defined in `tir/op.h`). A naming convention is that: - Lowercased methods, for example, `tvm.tir.mul`, means an TIR op, which will be eagerly constant-folded, i.e. `mul(1, 2)` returns `3` immediately rather than creating an AST node. - Capitalized callable, for example, `Mul`, means creating an AST node without constant folding. This PR makes this behavior more explictly by printing `T.Mul(a, b)` directly when `a` and `b` are both constants, rather than sugaring it into `mul(a. b)` or `a * b`, so that the difference between an op and an AST node is clarified. Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com> Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>

[FQ2I] Add cast back to output data type after AvgPool2d This commit fixes the following issue: For the sequence of qnn.dequantize -> avg_pool2d -> conv2d -> qnn.quantize FQ2I pass inserts qnn.requantize (or cast) to int32 unconditionally before AvgPool2d. As a result fake quantized qnn.conv2d gets input as int32 dtype, but it is forbidden for qnn.conv2d (supports only uint8/int8/int16). This commit adds the following: Add cast back to output data type after AvgPool2d. This preserve input dtype == output dtype for this op.

This PR adds all common TIR intrinsics like `T.int32x4`, `T.floatx4`. Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>

apache#13345) Fix 2 issues of cache related primitives: * Fix region_cover checking for cache related primitives * Fix CacheLocDetector for nested SeqStmt Co-authored-by: Min Chen <chen.min@intellif.com>

This PR introduces some minor restructuring of the `python/tvm/script` folder structure to make it more convenient for future upstreaming. Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>

In this PR, the skipped tests script will also check if tests in the `required_tests_to_run.json` have not been skipped. If there are skipped tests, they will be added to the returned comment. I am not entirely sure where it's best to place the `required_tests_to_run` file, so I left it in `tvm/ci/scripts/`. I am happy to take suggestions. Aims to prevent situations such as apache#12529

…13368) This PR is a duplicate of apache#12940 and apache#12941. For some reason, I am unable to reopen apache#12940.

…pache#13326) Preivously, the block SREF reuse only included a single step of changes, and would have an incorrect mapping if multiple sequential changes to the TIR block occurred. This could happen if a `BufferStore` was updated, followed by replacement of `Block` iter vars/values. This commit tracks the Block replacements across each usage, to ensure the SREF instances remain valid.

Merging apache#13368 caused CI to pass but run more than it needed to due to some failures in determination. This fixes the interpolation to use `"` which should correctly pass through the variables Co-authored-by: driazati <driazati@users.noreply.github.com>

This PR does not merge `main` if CI is running already on `main`. It aims to avoid a case where a race happens between two subsequent commits, and one of them merges the other. Fixes apache#12392.

…he#13383)

) This enables int64 biases for quantized fully connected, requantize and transpose convolution in TFLite networks. It goes on top of existing int16 support for TFLite frontend. Add a test case using DS_CNN int16 quantized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation for Adreno deployment #22

Add documentation for Adreno deployment #22

Commits on Nov 1, 2022

Commits on Nov 2, 2022

Commits on Nov 3, 2022

Commits on Nov 4, 2022

Commits on Nov 5, 2022

Commits on Nov 6, 2022

Commits on Nov 7, 2022

Commits on Nov 8, 2022

Commits on Nov 9, 2022

Commits on Nov 10, 2022

Commits on Nov 11, 2022

Commits on Nov 12, 2022

Commits on Nov 13, 2022

Commits on Nov 14, 2022

Commits on Nov 15, 2022

Add documentation for Adreno deployment #22

Are you sure you want to change the base?

Add documentation for Adreno deployment #22

Commits on Nov 1, 2022

Commits on Nov 2, 2022

Commits on Nov 3, 2022

Commits on Nov 4, 2022

Commits on Nov 5, 2022

Commits on Nov 6, 2022

Commits on Nov 7, 2022

Commits on Nov 8, 2022

Commits on Nov 9, 2022

Commits on Nov 10, 2022

Commits on Nov 11, 2022

Commits on Nov 12, 2022

Commits on Nov 13, 2022

Commits on Nov 14, 2022

Commits on Nov 15, 2022