Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Adreno deployment #22

Open
wants to merge 99 commits into
base: dbarinov/main
Choose a base branch
from

Conversation

dsbarinov1
Copy link

@echuraev @elvin-n review please :)

Comment on lines 258 to 259
.. |High-level overview of the Adreno A5x architecture for OpenCL| image:: https://i.ibb.co/yXm6CkG/2022-10-21-14-39-08.png
.. |Android deployment pipeline| image:: https://i.ibb.co/xMQrgLn/Untitled-Frame-2.jpg

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try to find and check where the image files should be located if you use them in the documentation? I believe that it should be somewhere in TVM repository.

Introduction
------------

Adreno is a series of graphicfdgs processing unit (GPU) semiconductor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Adreno is a series of graphicfdgs processing unit (GPU) semiconductor
Adreno is a series of graphics processing unit (GPU) semiconductor


The Adreno GPU accelerates the rendering of complex geometries to
deliver high-performance graphics and a rich user experience with low
power consumption

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
power consumption
power consumption.

deliver high-performance graphics and a rich user experience with low
power consumption

This guide will demonstrate the benefits of using textures with Adreno,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the moment, TVM is able to benefit from this by having texture support
for Adreno. The graph below shows the Adreno A5x architecture.

|High-level overview of the Adreno A5x architecture for OpenCL| Fig. 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move description on the next line. And probably do it italic

Comment on lines 51 to 60
# .. code-block:: bash
#
# find ~/Android/sdk/ndk/ -name libc++_shared.so
#
# (Linux)
# .. code-block:: bash
#
# find ~/Library/Android/sdk/ndk/ -name libc++_shared.so
#
# (MacOS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# .. code-block:: bash
#
# find ~/Android/sdk/ndk/ -name libc++_shared.so
#
# (Linux)
# .. code-block:: bash
#
# find ~/Library/Android/sdk/ndk/ -name libc++_shared.so
#
# (MacOS)
# .. code-block:: bash
#
# find ${ANDROID_NDK_HOME} -name libc++_shared.so
#

Comment on lines 110 to 116
import os
import numpy as np
import mxnet.gluon as gluon
import tvm
from tvm import relay, rpc
from tvm.contrib import utils, ndk
from tvm.contrib import graph_executor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import os
import numpy as np
import mxnet.gluon as gluon
import tvm
from tvm import relay, rpc
from tvm.contrib import utils, ndk
from tvm.contrib import graph_executor
import os
import numpy as np
import mxnet.gluon as gluon
import tvm
from tvm import relay, rpc
from tvm.contrib import utils, ndk
from tvm.contrib import graph_executor

Comment on lines 201 to 202
target="opencl -device=adreno"
target_host="llvm -mtriple=arm64-linux-android"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a description about target_host?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is depricated usage of the target/target host. Currently you have to create python object Target and initialize it by apprpriate target_host

######################################################################
# Load a test image
# -----------------
# As an example we would use classical cat image from ImageNet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display in the document this image.

# Evaluate inference time cost...
# Execution time summary:
# mean (ms) median (ms) max (ms) min (ms) std (ms)
# 76.6101 76.6053 77.6530 75.5720 0.6789

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to convert this script into the web page?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must not be converted manually. It is done automatically when documentation is built

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, but probably it would be better to convert it locally and be sure that everything is fine in the final document.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main value is py script that is potentially can be executed by user. Why do we care about doc? It should be converted automatically...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because from this .py script, web page will be automatically generated and presented in the official TVM documentation, and I want to be sure that everything looks correctly. This is why I asked this question. I don't think that there is the problem to build this page locally and just check that everything is fine.

@dsbarinov1
Copy link
Author

@echuraev @elvin-n some fixes done

docs/how_to/deploy/adreno.rst Outdated Show resolved Hide resolved
docs/how_to/deploy/adreno.rst Outdated Show resolved Hide resolved
docs/how_to/deploy/adreno.rst Show resolved Hide resolved
gallery/how_to/deploy_models/deploy_model_on_adreno.py Outdated Show resolved Hide resolved
Comment on lines 201 to 202
target="opencl -device=adreno"
target_host="llvm -mtriple=arm64-linux-android"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is depricated usage of the target/target host. Currently you have to create python object Target and initialize it by apprpriate target_host

docs/how_to/deploy/adreno.rst Outdated Show resolved Hide resolved
@dsbarinov1 dsbarinov1 changed the base branch from main to dbarinov/main November 1, 2022 20:23
@dsbarinov1
Copy link
Author

Please review "Fix docs" commit @elvin-n @echuraev

tmoreau89 and others added 4 commits November 2, 2022 06:10
…o reduce tuning time (apache#13259)

* [MetaSchedule] Swap the order of RewriteTensorize and VerifyGPUCode to
reduce tuning time

* add comment
See issue apache#13227.

Co-authored-by: driazati <9407960+driazati@users.noreply.github.com>

Since the process of building TVM for Adreno is exactly the same as the
process of building TVM for Android, please refer to these instructions:
`TVM RPC
Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`__
Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need _ in the end of the line?

)

.. |High-level overview of the Adreno A5x architecture for OpenCL| image:: images/adreno_architecture.png
.. |Android deployment pipeline| image:: images/android_deployment_pipeline.jpg
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably do the same sizes for boxes Pre-trained model and Compiled library (.so). Also, the top and bottom arrows in GraphExecutor have different locations. The bottom arrow is lefter than the top. Probably it would be better if they will be on the same vertical.

mod, target_host=target_host, target=target, params=params
)

.. |High-level overview of the Adreno A5x architecture for OpenCL| image:: images/adreno_architecture.png
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add reference where you took this image (I mean on Qualcomm guide)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 19 to 20
enabled. It will also provide :ref:`example code<Build and deploy model for Adreno>` to better understand the differences with compiling and deploying models
on Adreno devices.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but think it sounds better:

Suggested change
enabled. It will also provide :ref:`example code<Build and deploy model for Adreno>` to better understand the differences with compiling and deploying models
on Adreno devices.
enabled. It will also provide :ref:`example code<Build and deploy model for Adreno>` to better understand the differences in compiling and deploying models
for Adreno devices.

Advantages of the Textures
--------------------------

One of the advantages of Adreno is its clever handling of textures. At
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One of the advantages of Adreno is its clever handling of textures. At
One of the Adreno's advantages is clever handling of textures. At

---------------------------------

In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
the generation of kernels with and without textures and, in addition, the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the generation of kernels with and without textures and, in addition, the
the differences in generated kernels with and without textures and, in addition, the

Comment on lines +107 to +98
|Android deployment pipeline|

*Fig.2 Deployment pipeline on Adreno devices*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we describe what the reader can see on this picture?


__kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
// body..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some more details about the differences, e.g.:

image2d_t is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions. Also, when we use image2d_t we read 4 elements at one time, and it helps to utilize hardware in a more efficient way.

"""


#################################################################
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why before you use """ for commenting and now #? Probably use one comment style for huge multiline comments?

Comment on lines +78 to +85
# export TVM_TRACKER_HOST=0.0.0.0
# export TVM_TRACKER_PORT=9190
#
# check that the tracker is running and the device is available
#
# .. code-block:: bash
#
# python -m tvm.exec.query_rpc_tracker --port 9190
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, using environment variables is not the most convenient way to configure tracker. You can also pass all these configurations through command line.

lhutton1 and others added 8 commits November 2, 2022 14:15
This commit ensures that constant folding is applied when a desired
layout is selected during compilation. It ensures that
`layout_transform` operations are removed where possible so that
pattern matching for BYOC backends can work effectively.

A test has been added to check this regression.
…3252)

This commit applies additional write permission to the "tvm-venv"
group virtual environment. Currently after entering a container from
a newly built image it dosn't seem possible to install/update Python
packages. E.g. updating pip will give errors such as:
```
$ pip install --upgrade pip
ERROR: Could not install packages due to an OSError: [Errno 13]
Permission denied: '/venv/apache-tvm-py3.7/bin/pip' Check the
permissions.
```

Enabling write access for this group fixes this as long as the
current user is a member of the "tvm-venv" group.
* [Hexagon] Tests pylint

* fix error

* Fix buffer name
- Fix clang 15.0.3 '-Wunused-but-set-variable' and '-Wunused-lambda-capture' warnings by removing / commenting-out code.
…ing function_def_to_graph_def (apache#13260)

[TF2] Import graph_def to default graph before calling function_def_to_graph_def
apache#13247)

There are a local variable referenced before assignment in convert_interpolate function. I think varible 'size' is real want to be referenced.
…he#13274)

This reverts commit 5acf3f9.

Reverting since this is causing some spam from the ASF Infra bot related
to https://issues.apache.org/jira/browse/INFRA-23834. As in that issue
the protections have been applied manually by ASF Infra so this revert
shouldn't have any real effect
@dsbarinov1
Copy link
Author

@echuraev small fixes done

bkmgit and others added 9 commits November 3, 2022 20:55
Minimal dependencies for Fedora/CentOS

This commit indicates how to install minimal set of
dependencies for building Apache TVM on Fedora and
CentOS. It supplements existing information for
Ubuntu and MacOS.
Fix occurrences of clang's `-Wdocumentation-unknown-command` warning.
Fix code to address a valid `-Wredundant-move` clang warning.
* [ETHOSN] Inline non-compute-intensive partitions

Adds a pass that analyzes functions partitioned for the NPU and inlines
those that are deemed "non-compute-intensive" back to the main function
so that they can be considered for other backends. The current heurisic
for deciding a non-compute-intensive function is to collectively check
all of the operations in the function have no multiply accumulate
operations. This heuristic is not optimial; optimization is left for
future exploration.

This pass is inspired by the "IsComputeIntensiveGraph" pass in the
TensorRT integration.

Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835

* Address comments

* 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute
_intensive_partitions_enabled'.
* remove no MAC operations.
* fix network test.

Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579

* Fix failing unit tests due to optimization

Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859

* Add future exploration suggestions

Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
* [ETHOSN] Throw error message when inference fails

Previously the runtime would silently skip interence failures and return
random values as the result. This can make spotting inference failures
challenging. The runtime now throws a fatal error when inference did not
complete successfully along with an error message that gives some
details about the error that occurred.

Change-Id: Iadb6da04ad1c906e3ec49959eb3da0978295aebf

* Address comments

* clarify test file brief
* add test case for running status
* add driver stack reference to WaitStatus class

Change-Id: I792742892b761534904816135ae2ffcb3f028b2c
This PR introduces a new argument for EvolutionarySearch that limits the failures (defined as rounds of no new generated candidate) in the `SampleInitPopulation` stage. In this way we can avoid the task to be hanging forever in special cases, e.g., some postproc always fails. This should fix apache#12330.
…he#13269)

Current type checker for TIR schedule had issue with typing for Python 3.9. 
This simple patch fixes this problem.
…marking (apache#13255)

This PR adds features to the `python/tvm/meta_schedule/testing/torchbench/run.py`.

- Integrate with the TVM PyTorch integration to handle boolean tensor and unaligned memory.
- Deduplicate collected tuning tasks to prevent thousands of tasks created by hundreds of subgraphs with similar structure.
- Add option to cast model to float32, which are more stable numerically than float16 and prevents inaccurate result from many models.
- Add option to choose search strategy in MetaSchedule.
- Inspect output error if the actual output doesn't match the expectation. Also save the actual output and expected output for further analysis if needed.
- Save subgraphs and their example input for debug purpose.
- Print MetaSchedule profiling information at the end of execution.
- Detach PyTorch tensor before exporting to dlpack.
- Fix the sys path to avoid conflict with the `benchmarks` package installed by TorchBench dependency.
- Trim all command line args passed in, in order to prevent breaking some TorchBench model that depends on args.
- Empty cuda cache before starting the actual benchmark.
Add tensor rank check for `nn.instance_norm`.
masahi and others added 10 commits November 11, 2022 09:08
…el workload (apache#13334)

* [MetaSchedule] Add a new schedule rule to inline all scalar constants

* add doc

* reorg

* identify constant block by its structure, not by name
…che#13354)

This PR introduces a check to prevent records with run time of zero into the training data of cost model. This is because when working on microTVM there're cases where the run time of certain successful runs is very tiny, such that it got recorded as zero. In such cases, the runtime of 0 would break XGBoost model because it introduces infinite running speed in GFLOPs. A regression test was also added.
It seems like there is some inconsistency across the python versions and make PR apache#13269 fails at Python 3.10. 
This patch fixes this issue. 

Co-authored-by: Junru Shao <junrushao1994@gmail.com>
…tiLevelTilingTensorCore` (apache#13357)

* Fuse shared to global store loops in MultiLevelTilingTensorCore

* update test
…etConsumers() (apache#13344)

Currently there are two versions of `GetConsumers()` and `GetProducers()` implementation. Make them consistent to avoid possible bug when there are WAR dependencies.
As part of effort of more formal TIR semantics, we want to more
explicitly differentiate TIR AST nodes (defined in `tir/expr.h`)
and TIR ops (defined in `tir/op.h`).

A naming convention is that:
- Lowercased methods, for example, `tvm.tir.mul`, means an TIR op, which
  will be eagerly constant-folded, i.e. `mul(1, 2)` returns `3`
  immediately rather than creating an AST node.
- Capitalized callable, for example, `Mul`, means creating an AST node
  without constant folding.

This PR makes this behavior more explictly by printing `T.Mul(a, b)`
directly when `a` and `b` are both constants, rather than sugaring it
into `mul(a. b)` or `a * b`, so that the difference between an op and
an AST node is clarified.

Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>

Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>
[FQ2I] Add cast back to output data type after AvgPool2d

This commit fixes the following issue:
For the sequence of qnn.dequantize -> avg_pool2d -> conv2d ->
qnn.quantize FQ2I pass inserts qnn.requantize (or cast) to int32
unconditionally before AvgPool2d. As a result fake quantized qnn.conv2d
gets input as int32 dtype, but it is forbidden for qnn.conv2d (supports
only uint8/int8/int16).

This commit adds the following:
Add cast back to output data type after AvgPool2d. This preserve input
dtype == output dtype for this op.
@dsbarinov1
Copy link
Author

@elvin-n please, review final adjustments :)

junrushao and others added 18 commits November 11, 2022 11:40
This PR adds all common TIR intrinsics like `T.int32x4`, `T.floatx4`.

Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>
apache#13345)

Fix 2 issues of cache related primitives:
*  Fix region_cover checking for cache related primitives
*  Fix CacheLocDetector for nested SeqStmt

Co-authored-by: Min Chen <chen.min@intellif.com>
This PR introduces some minor restructuring of the `python/tvm/script`
folder structure to make it more convenient for future upstreaming.

Co-authored-by: Yaxing Cai <caiyaxing666@gmail.com>
In this PR, the skipped tests script will also check if tests in the `required_tests_to_run.json` have not been skipped. If there are skipped tests, they will be added to the returned comment. 

I am not entirely sure where it's best to place the `required_tests_to_run` file, so I left it in `tvm/ci/scripts/`. I am happy to take suggestions.

Aims to prevent situations such as apache#12529
…pache#13326)

Preivously, the block SREF reuse only included a single step of
changes, and would have an incorrect mapping if multiple sequential
changes to the TIR block occurred.  This could happen if a
`BufferStore` was updated, followed by replacement of `Block` iter
vars/values.  This commit tracks the Block replacements across each
usage, to ensure the SREF instances remain valid.
Merging apache#13368 caused CI to pass but run more than it needed to due to
some failures in determination. This fixes the interpolation to use `"`
which should correctly pass through the variables

Co-authored-by: driazati <driazati@users.noreply.github.com>
This PR does not merge `main` if CI is running already on `main`. It aims to avoid a case where a race happens between two subsequent commits, and one of them merges the other.

Fixes apache#12392.
)

This enables int64 biases for quantized fully connected, requantize
and transpose convolution in TFLite networks. It goes on top of existing
int16 support for TFLite frontend.

Add a test case using DS_CNN int16 quantized.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.