Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge to self branch #1

Merged
merged 154 commits into from
Aug 2, 2021
Merged

merge to self branch #1

merged 154 commits into from
Aug 2, 2021

Conversation

jiangjiajun
Copy link
Owner

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

Hzfengsy and others added 30 commits July 1, 2021 14:46
…8381)

After fix a66186b, I saw that it should be necessary to do the same fix
for depthwise_conv2d for intel graphics. I saw that we never used the
removed code and it is just the same code from
cuda/depthwise_conv2d.py. So we can use the cuda implementation when it
will be necessary.
* fix type relation for batch_matmul

* fix lint
* Fix np.int and np.float usage in the tree.

Newer versions of numpy give loads of warnings that suggest
that np.int and np.float will be deprecated. CI uses pytest
and these warning logs clog memory for testing and make it
slower.

* Fix formatting
* rename _update_target and document its function

* make tvm.build return OperatorModule to return multiple outputs

* allow retrieving the var names used in TIR repr

* add Operator Model Library Format and test

* Add pathlib convenience functions to utils.TempDirectory.

* fix tests

* black format

* git-clang-format

* pylint fixes

* add asf header

* change memory map to make more sense, fix tests

* address giuseros comments

* align GetVarName with future TypedPackedFunc

* fix test

* clang-format

* rev model library format to v4 (bad merge)
Remove warning about macOS support from tutorial
* add stm32l4r5zi_nucleo

* add parameter for test qemu

* file type check

* fix test

* change order

* revert
* fix weight shape in torch.mm conversion

* Revert "fix weight shape in torch.mm conversion"

This reverts commit a1a8fd3.

* [Torch] remove unused conversion
* [Arith] Inverse affine map

* [Arith] Inverse affine map

* Update iter_affine_map.h

* Update iter_affine_map.h

* Update iter_affine_map.py

* Topology order visit

* doc

* fix

* address comments

* lint

* remove print
* Support test aten::flip

* Support aten::flip
* rename resize to resize2d

* refactor resize_2d

* Add resize1d op, normalize attribute names across ops

* normalize resize3d to match the API of 1D and 2D

* fix lint

* fix relay tests from API change

* refactor topi tests, docs

* fix method naming in framework frontends

fix more frontend issues

* refactor resize tests to reuse components, add more coordinate tranform modes to tests

* add cubic resize reference kernel and tests, add relay tests for resize1d

* fix pylint

* fix test typo
* [fix] Broken link in apps for wasm-standalone

* [fix] Broken link in apps for wasm-standalone

* [CI] Manual trigger for CI
Co-authored-by: Jackson Hsieh <chengpi@amazon.com>
In a similar vein to previous pull requests
replacing deprecated use of np.bool and np.int from
numpy with bool and int.

https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
* [ONNX] Wrap 'If' if it has multiple outputs

Without this wrapper, an assertion in from_onnx() will fail with the
error message showing ""Number of output mismatch"

* [ONNX] Test If nodes with multiple output tensors

* Fix formatting issues
* Fix AttributeError when TEST_DATA_ROOT_PATH is set

Initiate a Path object from TEST_DATA_ROOT_PATH to fix the error:
AttributeError: 'str' object has no attribute 'mkdir'

* [DOCS] Add docs for Pass Instrument

 - Add a tutorial about how to use pass instrument.
 - Add related sections in Pass Infrastructure documents.

* Fix ir.rst, the length of separator.

* Fix unused local name

* Fix linting errors

* Fix linting errors

* Fix linting errors

* Address code-review feedbacks

* Fix linting

* Fix the order of tutorial.

* Add exception handling. Address feedbacks.

* Fix CI error -- clearing instruments in global pass_ctx

* Clarify section hierachy.

* Emphasize to use decorator instead of subclassing

* Add a sentence to explain Pass Instrument. Fix typo.

* Shrink python docs a little.

* Fix tag name.

* Address feedbacks.
Duplicate the CompileEngine interface.

Refactor the graph_runtime_codegen to invoke the new LowerTE pass

More changes

Things appear to be working

Some tracing to get Relay code to flow through too.

Disable some assertions as exp.

Tweak printing for now

Fix a few bugs: (#13)

1. Don't add relay main function to list of lowered TIR functions
2. Don't skip visiting call to relay function in graph runtime codegen

Remove debug prints.

Start refactoring

Split out shared data structures

Fix implicit duplicate decl of IsDynamic

Clean up handling of name + global prim fn

Clean up the code and debug issue introduced by previous hack

Clean up the debugging

Do C++ lint clean up

Update src/relay/backend/graph_executor_codegen.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Clean up handling of external functions

Add more error messages

More clean up

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Fix

CR

More CR

Format

Fix lowering path for C++

Fix tests

Remove uncessary change

Clean up a few more things

CI fix

Fix the default context

Fix

Fix broken test cases

Update

Fix

WIP

Clean up storage data structures

WIP

WIP

Fix build errors

Remove TVMLower

Fix lint

Lint again

fix black

Move UpdateMainWorkspaceSize into te_compiler.cc

Fix link errors

Formatting

Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo>

Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI)

Testing how functions should be named

Lint

Change how function metadata is updated

Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map

Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time

Fix return in UpdateMainWorkspaceSize

Lint

Try to fix UpdateMainWorkspaceSize

Fix construction of static memory plan

Clean up code while debugging

Adding UpdateWorkspaceSize back

Add closure + call to UpdateFunctionMetadata (WIP)

UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else

Add some debugging of UpdateMainWorkspaceSize

Starting to move UpdateFunctionMetadata call to use process_fn infra

UWhat target should be passed to UpdateFunctionMetadata?

UpdateFunctionMetadata is not workinggg

Added some comments about UpdateFunctionMetadata for Jared

Fix the creation of function metadata

Try another stab at cleaning up the information

Fix

Port StorageInfo and StaticMemoryPlan data structure (#8297)

Restoring reshape opt

Fix tests

Caught a nasty typo from Lily, Map::Set does not mutate

Format

Disable stupid Google style warning

Rebase cleanup

Formatting

Add docstring for storage info

Black

Post rebase fix

Remove prints

Disable assert that doesn't make sense for now

Fix lint

Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars

Work with Lily to fix graph attrs

Try to figure out where extra arguments are coming from; fix merge

passes the profiling test

Clean up

Fix profile test

Remove debugging

Add attributes for BYOC uTVM case

Format

Dumb typo

Another fix for byoc

Format

Fix last 3 failing tests

Format

Fix final two test cases

Format

Fix lint

Fix again

Fix

Fix auto scheduler code

Fix issue

Address CR comment

Format

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
When dilation is larger than value 1 in conv2d with NHWC
layout, the ordering of indexes when accessing data array
in computation of convolution appears to be incorrect.

'data_vec' is defined as

lambda n, oho, owo, kh, kw, ic, ohi, owi:

But accessed as

data_vec[n, oho, owo, kh, kw, ohi, owi, ic]

This patch fixes the order of indexes and modifies the test
so that it is suitable for running on an AArch64 CPU.
* [Relay] Add support of conv2d with NHWC for Mali

Added template schedule for conv2d NHWC reusing similar strategy
as for NCHW layout. The schedule is also added to the
corresponding test that can be run to verify correctness.

* [Relay] Fix issue from pylint in conv2d for Mali
With either the ci_lint docker image, or the matched version of
pylint==2.4.4, I got two lint errors running locally that didn't show
up in the CI.  Fixing them.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
-Some ops(ex:view) call infer_value when converting a model into Relay IR.
-If LLVM is not enabled, it leads to segementation fault.

Co-authored-by: kueitang <kueitang@qti.qualcomm.com>
* [Bug] Fix x86 dense schedule extern ops

* more

* lint
AndrewZhaoLuo and others added 29 commits July 29, 2021 15:13
* convert ot python list like expected

* test example

* jostle ci

Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
* [AutoScheduler] Fix task extraction with TE compiler

* fix

* test

* Update python/tvm/auto_scheduler/relay_integration.py
…da (#8554)

* [TOPI][CUDA] minor change on assert statement

* [TOPI][CUDA] reformatting
…ests (#8541)

* Fix issue in 'vectorize' function for 1D and 3D tensors

* Add pooling tests for channel last layouts

* Add support for more general layouts in "poolnd" implementation

* Reformat with 'black'

* Fix lint issues
* This is a good practice to save storage space in
  the Docker images being created
* Also sort pip package lists alphabetically
)

* Add basic support for batch matmul transpose

* Update

* Lint fix & add tf convert support

* Update

Lint fix

* Bug fix for qnn.batch_matmul

* Bug fix for tensorflow test

* Add grad support for batch_matmul

* Lint fix

Re-triggle CI

Bug fix

Re-triggle CI

Re-triggle CI

Re-triggle CI
…xtFuncs (#8523)

* bug fix and add tensorarray with partition pass test case

* change test function location and address comments

* Update tests/python/relay/test_pass_partition_graph.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* trigger CI

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* handle upcasting case

* test upcasting tests for tir

* address comaniac comments

* formatting

* add negative tests

* fix failing test now allow other things

Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
* Remove all attr::storage_scope usage

* pyformat

* fixed VTA tests

* Update TIR text printer to print storage_scope on allocate

* print storage scope in AllocateNode ReprPrinter

* Fixed accidently removed scope tag check

* remove unused function

Co-authored-by: masa <masa@pop-os.localdomain>
* Bug fix for numpy scalar input in vm

* Bug fix

* Re-triggle CI

* Update

* Update UT

* Re-triggle CI
* reduce testing time

* lint issues were resolved. weights for test are always randomly generated

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* hotfix check_grad perf regression: lift compile out of hot loop

* hoist interpreter creation out of python closure, fix weird conv2d bug on arm cpu

* lint

* try one more fix
- Added device_type to the device-queried information.

- Sort the vulkan devices by the device_type.  Priority is discrete >
  integrated > virtual > cpu > other.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* Otherwise, stale pytest-results could appear in builds.
* Fix storage_access not visiting else branch

* fix conflict with #8516 in the test

* update thread sync test following #8516 update
* add flag

* fix and test

* format

* fix memory memory_align function

* fix and address comments

* format

* fix crt aot test

* comments

* fix test

* trigger

* trigger

* trigger

* trigger

* trigger

Co-authored-by: Mehrdad Hessar <mhessar@ip-172-31-20-199.us-west-2.compute.internal>
* [Vulkan] Rewrote PointerValueTypeRewrite transform

In C-style codegen, pointer types can be freely cast between scalar
and vectorized types (e.g. `float16x4* <-> float16*`).  In SPIR-V,
these are separate types, and no such casting is allowed.  This was
previously handled by having a special-case for `Ramp(base, stride=1,
lanes)` in the codegen.  That method didn't cover all possible cases,
including Broadcast nodes used as indices.

PointerValueTypeRewrite previously re-wrote the AllocateNode and
parameter pointer types, but didn't update the Load/Store node.  This
change tracks which variables can be updated to a vectorized type, and
then updates all references to those.  This includes removing the
`RampNode`, as the vectorization is then included as part of the
variable type.

* [StorageRewrite] Updates as recommended in review.

- Added explicit TODO(Lunderberg) for follow-ups

- Pass `checker.info_map_` instead of `checker` to
  `VectorTypeRewriter`

* [Vulkan] Allow for pointer rewrites that change base type.

A single memory allocation may have more than one type of data stored
within it.  This allows the PointerTypeRewrite pass to recognize if a
function only uses the pointer as a particular base type.  This wasn't
an issue in C-based codegen, but is required for Vulkan.  Since Vulkan
shaders do not permit type-casting, the cast must be done when passing
the pointer argument into the shader.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
* [TOPI][CUDA] Improve the performance of scatter_nd by:

1. Split into 2 kernels, one does the "Init" and another does the "Update".
   Thus they can have different Grid/Block configurations to better utilize
   SMs.
2. Use atomic_add instead of direct assignment, which could avoid the race
   condtion when multiple indices point to the same location of the output
   tensor. With this moidification, it's safe now to use more CUDA threads
   to gain more parallelism.

* Fix python code format.

* FIX: [TOPI][CUDA] Improve the performance of scatter_nd #8479

- Split ScatterND kernel into 2 sub-kernels using ib.new_scope()

- Replace ib.for_range() with blockIdx.y

- Using atomic_add when mode == "add"

- Keep threadIdx.x less than max_threads of GPU

* Comment added

* Add fallback implementation when "mode=add" meets int64

- Atomic_add from CUDA doesn't support int64 data type
- Change "ind{i}" to "ind%d"%i, where names of relay.var could correctly display

* Python format

* Fix line too long

* CI pass

* Empty, for CI pass

* Empty, for CI pass

* Empty, for CI pass

* Empty, for CI pass

* Empty, for CI pass

* Exchange blockIdx.x and blockIdx.y

* check for Vulkan or metal

* Fallback to previous algorithm when mode==update

* Update python/tvm/topi/cuda/scatter.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Assign TODO

* Swapping then and else block

Co-authored-by: wenxizhu <wenxizhu@tencent.com>
Co-authored-by: CaptainDuke <captainduke328@gmail.com>
Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
* ccache

* ccache

Fix formatting

Add comment about nvcc

Change default to AUTO

More progress

Add auto as a mode

Disable ccache in CI

add-cache-to-cmake

Fix typo

* Fix rebase

* flaky test
@jiangjiajun jiangjiajun merged commit 28bb1de into jiangjiajun:paddlepaddle Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.