Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upate #8

Merged
merged 69 commits into from
Aug 13, 2021
Merged

upate #8

merged 69 commits into from
Aug 13, 2021

Conversation

jiangjiajun
Copy link
Owner

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

Matthew Brookhart and others added 30 commits August 3, 2021 09:28
* [Refactor] Avoid Override Generic Op Strategy in "hls.py"

* Fix The Broken CI Test Cases
Set the number of cores for scripts and builds that run inside the RVM
based on the specified number of cores for the VM.

Currently Vagrant doesn't set env. variable TVM_CI_NUM_CORES with the
number of cores available in the VM created by Vagrant, as a consequence
the scripts and builds (like the ones used to build TVM and QEMU) that
run inside the VM after it is created will use the default number of
only 2 CPUs, so not using the full CPU resources available in the VM,
in case there are more than 2 cores available.

This commit sets TVM_CI_NUM_CORES equal to the number of cores available
in the VM created by Vagrant so the builds (which use that environment
variable to find out the number of CPUs that must be used for the
builds) can use all the CPUs available, speeding up the builds.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
- Move "from_device" argument definition from "vulkan" target to all
  targets.

- Add device querying to TargetInternal::FromConfig, using
  "from_device" argument.  If present, these have lower priority than
  explicitly-specified attributes, but higher priority than the
  default attribute values.

- Add default no-op DeviceAPI::GetTargetProperty.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* [Runtime] Add graph_executor get_input_index API.

In graph_executor use case, user can use set_input with
input index to set input parameter, but there is no straight
forward way to get correct index number with input name, here
provide get_input_index API to do such work.

* Update python/tvm/contrib/graph_executor.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Update python/tvm/contrib/graph_executor.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Update python/tvm/contrib/graph_executor.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* [Target] Allow for spaces in target attributes.

Some target parameters, such as the device_name on vulkan, have spaces
in them.  This prevented round-trips between string and Target
objects, which can occur in some cases.

* [Vulkan] Fixed "device_name" property querying.

* [Target] Switched from escaped spaces to quoted spaces.

Instead of -attr=value\ with\ spaces, will instead be written as
-attr='value with spaces'.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* [AMP] Do not allow fp16 cast on arange inputs

* add test

* Add comment explaining the issue with fp16 "end"
Platform boards passed to base-box-tool.py need to be a subset of
platform boards support by 'tests/micro/zephyr --microtvm-platforms='.

Currently base-box-tool.py only accepts the 'stm32f746xx' ST board,
which is not supported by 'tests/micro/zephyr --microtvm-platforms='. As
a consequence if one passes '--microtvm-platform=stm32f746xx' to
base-box-tool.py the 'tests/micro/zephyr' test will fail.

That commmit fixes it by adding two new platforms to base-box-tool
('stm32f746xx_nucleo' and 'stm32f746xx_disco') which are supported by
tests/micro/zephyr and by removing the nonexistent 'stm32f746xx'
platform. The new platform boards are quite similar and share the same
USB VID and PID.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
- Pass parameters through TVMRetValue as std::string instead of
  runtime::String

- Remove escaping of spaces inside quotes for target attributes.
  Updated unit test to verify round-trip behavior.

- Added missing "device_type" query for Vulkan.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
We kill the rpc server in the del function. When a server
co-exist with remote resources in the same function scope,
the destruction order is not determined.

This can cause server to be destructed before the actual remote array.
As a side effect, it can cause sometime test to timeout due to
waiting on the socket.
* Fix support for linking to only libtvm_runtime

also ensures that the ResNet example uses the new support.

* Fix build.rs to rebuild if the Python script changes

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
#8660)

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* Add transpose support for tensorrt batch_matmul

* Address PR comment

* Refactor to add ONNX_DEFAULT_CONFIGS
* [TENSORIR] Add `from_legacy_te_schdule` attr to TE PrimFuncs

The `from_legacy_te_schedule` marks PrimFuncs created from TE
scheduling. Passes that only operate on TE scheduling check this attrs
and no op if it is not found. If `from_legacy_te_schedule` is false or
not set, then it is assumed that the PrimFunc is from TensorIR. Passes
specific to TensorIR now check for the absence of this attr.

* formatting

* enable passes regardless of te or not
* Move flake8 to ci_lint

This fixes the scenario where you lint with ci_lint but it can still
fail in PR due to flake8 being injected only into the Mac build.

* Disable flake8 until the docker changes have landed
* Add linear congruential engine.

* Fix typo.

* Minor fix.

* Fix comments and intros.

* Change to unsigned.

* Minor comment fix.

* Fix unsigned rand state to signed.
* fuse dence sum

* remove excess copying

* dev LSTM in ONNX

* alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning

* LSTM_dev2 was implemented in onnx frontend

* LSTM dev in pytorch frontend

* LSTM cell implementation was transferred to common place. Unneccessary code was removed

* lint fixes

* Weights permutation for LSTM layer in onnx frontend

* LSTM cell description was added

* arguments and values were renamed. descriptions of some methods were added

* LSTM output shape and actvations input format were fixed in onnx frontend

* empty. tvm-ci test

* unbind method was transferred from onnx frontend to common.py

* unbind method was transferred from pytorch frontend to common.py

* lstm cell was transferred from op/layers.py to frontend/common.py

* clean up weight dictionary initialization

* fix pytorch frontend wrapper over unbind method

* minor fix of comments

* empty. tvm-ci test restart

* empty. tvm-ci test restart

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
…d target (#8542)

* [Onnx][UnitTests] Excluded additional onnx tests

- The onnx tests `test_basic_convinteger`, `test_convinteger_with_padding`, `test_range_float_type_positive_delta_expanded`, and `test_range_int32_type_positive_delta_expanded` don't run correctly on CUDA targets, so they are added to the exclusion.

- Parametrized over the relative directory name, rather than the full directory name.  This improves readability of the pytest output, and keeps the same parametrized test name across different python version.

- Changed the target-specific skips to check the target kind, rather than the full target string.

* [UnitTests] Apply correct requires_gpu() pytest marks for parametrized target

Prevoiusly, the addition of tvm.testing._target_to_requirement pytest marks
was handled by the parametrize_targets function.  The
_auto_parametrize_target function assumed that a unit test that was already
parametrized had all markings needed.  If a unit test was explicitly
parametrized using @pytest.mark.parametrize, these marks would be missing.

In most cases, this explicit use of @pytest.mark.parametrize('target', ...)
should be avoided, but has value in the case of marking with multiple
parameters with @pytest.mark.parametrize('target,other', ...).  This use
case isn't yet supported by the tvm.testing.parameters function.  Therefore,
if this occurs, detect it and add the appropriate marks.

* [UnitTest] Bugfix, applying requires_* markers to parametrized targets.

Initial implementation did work correctly with
@tvm.testing.parametrize_targets.

Also, went through all cases where "target" is used to parametrize on
something other than a target string, and renamed.

* [Onnx] Switched from using pytest.skip to tvm.testing.known_failing_targets

After merging of the `tvm.testing.parametrize_targets` and
`tvm.testing._auto_parametrize_target` code paths,
`known_failing_targets` can be used in both cases.

* [Testing] Enable `Target` object as argument to _target_to_requirement

Previously, tvm.testing._target_to_requirement required the argument
to be a string.  This commit allows it to be either a string or a
`tvm.target.Target`.

* [Testing] Auto-target parametrization, handle pytest ParameterSet

If the unit test has already been parametrized with pytest.params to
add parameter-specific marks, respect those existing marks.

This can happen in some cases in the CI, uncertain yet what is causing
them.  Maybe pytest-xdist related, but there's some difficulty in
reproducing it locally.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* add hex indicator to message

* add pytest skip

* trigger

* trigger
* conv2d working, fixing conv2d_depthwise

* Depthwise conv2d working.

* Make convinteger work on cuda.

* Simplify code and add tests.

* Formatting.

* Fixed fallback broadcasting.

* Fix fallback broadcasting.

* Formatting.

* Fix lint

* Merge with new test parameterization.
…#8529)

* [Topi][Testing] Minor cleanup for python reference implementations

- Use input dtype for dilate/conv2d accumulate in python
  impl. Previously, the python implementations of dilation and conv2d
  would use numpy default dtype in some cases, rather than the input
  data's dtype.

- Added fallback for datatypes not supported by scipy.signal.convolve2d (e.g. float16).

- Refactored to avoid duplication, use common get_pad_tuple functionality.

* [Topi][UnitTests] Added float16 tests to test_topi_dense.py

* [Topi][UnitTests] Added float16 to test_topi_conv2d_nchw.py

* [Topi][Float16] Added float16 tests for depthwise conv2d.

* [UnitTests] Explicitly set seed for float16 tests

Intended to avoid flaky test failures later due to rounding errors.

* [UnitTests] Fixed a few failing unit tests.

- ref_data must be a test fixture, not acquired through
  request.getfixturevalue, in order to have the random_seed be known.

- dilate_python's return value didn't follow `out_dtype`.

- The test_topi_conv3d tests had the reference results computed in
  float64, due to dilate_python() not respecting the input data type.
  With the correct dtype, the tolerances needed to be slightly widened.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
guberti and others added 27 commits August 9, 2021 13:57
* Add Arduino CLI support to ci-qemu

* Install latest version of Arduino SDK

* Remove unnecessary --fix-missing

* Tweak to clarify what URLs go with what

* Retrigger CI

* Temporarily replace buggy Spresense core
…ut (#8677)

* add timeout

* rename timeout and change timeout to a reasonable value

* fix tests after project api merge

* retrigger because of flaktest
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* [Docs] Added documentation on pytest target parametrization.

Follow-up from #8542, to document existing features.

* [Docs] Updated pytest parametrization documentation following review

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* Fix obvious memory leak in function.rs

* Update object pointer
GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
* refactor host to qemu

* remove unused variables

* remove skip-build arg

* fix microtvm test script
* [Docker] Refactor/clean-up of docker/bash.sh

- Added detailed help message, displayed using `-h` or `--help`.

- Optional flags handled using `getopt`, can now occur in any order.

- `--mount` flag may occur more than once.

- Switched from short arguments to docker-run to long arguments
  (e.g. `--volume` instead of `-v`).  Short arguments are good
  shortcuts for interactive work, but can be more difficult to read in
  longer scripts.

- Mount the `.tvm_test_data` folder, to avoid re-downloading test data
  already available in the host environment.

* [Docker] docker/bash.sh CI fix

Dash-prefixed arguments as part of the command now require prefixing with
-- to separate them from arguments intended for docker/bash.sh

* [Docker] docker/bash.sh, consistent quoting

* [Docker] Added --repo-mount-point for docker/bash.sh

* [Docker] Updated command-line parsing of docker/bash.sh

- Maintained previous behavior, any unrecognized flags after the
  docker/bash.sh are part of the command, no -- is
  needed. (e.g. docker/bash.sh ci_gpu make -j2)

- Reverted changes to Jenskinsfile to add a --, no longer needed.

* [Docker] Fixed multi-argument commands

* [Docker] docker/bash.sh check permissions before mounting ~/.tvm_test_data

* [Docker] Consistent workplace directory in docker/bash.sh for Jenkins

Some locations in the CI perform build commands outside of the build
steps (e.g. tests/scripts/task_ci_setup.sh#L38), and cmake doesn't
like it if the build directory changes.  These should probably be
moved into the build steps of the CI, and be packed in tvm_multilib in
the Jenkinsfile, but for the meantime maintaining a consistent
/workspace directory on all CI nodes allows cmake to run.

* [Docker] Updated bash.sh for MacOS compatibility

MacOS has an older version of bash that handles arrays slightly
differently.  All instances of array expansion `"${ARRAY[@]}"` should
instead be written as `${ARRAY[@]+"${ARRAY[@]}"}`.  Otherwise, `set -u`
will erroneously complain about an undefined variable. See
https://stackoverflow.com/a/61551944 for details.

Even though this is an older version of bash (observed in version
3.2.57), this is the last major version available under GPLv2 and is
therefore the default version on MacOSX.  At some point, the
`docker/bash.sh` could be migrated to python for ease of
maintenance/testing.
* [Docs][UnitTest] Updated target parametrization documentation

The intended audience are developers writing unit tests, or debugging
unit tests that have failed.  Therefore, moving the recommended style
to the top of the section, and the implementation details to the
bottom.

* Documentation updates as recommended by tkonolige
* Refactor AOT Test Utils parameters into object

`compile_and_run` was getting quite complicated to understand as well as being mostly duplicated by `comile_and_run_multiple_models`.

This patch pulls out some common parameters into a data class `AOTTestNetwork` which makes it clearer what each parameter is doing and provides documentation.

* Rename Network -> Model and sizebytes -> size_bytes
* Convert AOT to TECompiler

This removes the dependency on "compile_engine.h" from aot_executor_codegen.cc. This required a few changes to how AOT was operating:
* AOT run_model is now based on the post lowering main_module
* AOTOnDemandAllocator is ran twice to ensure SIDs are updated post-lowering
* Moved to using tec::UpdateFunctionMetadata

Tests are passing, but would appreciate other validation 😸

* Clarify reasoning behind replanning memory later

* Use main_func_info rather than bespoke logic in AOT

This moves from using the bespoke AOT UpdateMainWorkspaceSize to the
LoweredModule main_func_info property to unify with Graph executor
codegen.
* clean up typerel

* add layout transform when input is 3D

* add test

* update doc to clarify that only 2D input data is supported

* add weight_layout attribute in dense

* remove explicit layout transform from dense_alter_op.py

* Add DensePackInferCorrectLayout to insert layout transform

* relax type rel

* revert type rel relax and add check on dim

* introduce DensePackAttrs to avoid breaking dense op

* try fixing arm compute lib test

* Update tests/python/contrib/test_arm_compute_lib/test_dense.py

Co-authored-by: lhutton1 <35535092+lhutton1@users.noreply.github.com>

* formatting

Co-authored-by: lhutton1 <35535092+lhutton1@users.noreply.github.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
* [UnitTest] Updated tolerances to avoid flaky unit test.

The result was correct, but the atol was just small enough to trigger
a CI error for a value that was close to zero in an unrelated PR at
#8670.

https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8670/16/pipeline/#step-236-log-1703

* Also updated 32-bit version of test_conv2d_nchw
* alternative chunk op was implemented in pytorch frontend. aten::unsafe_chunk was added to op map in pytorch frontend

* chunk was replaced by new one in pytorch frontend. it is faster in 2.5 times

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
This PR is part of the TensorIR upstreaming effort (#7527), which adds the one
schedule primitive storage_align.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
@jiangjiajun jiangjiajun merged commit 74cc942 into jiangjiajun:main Aug 13, 2021
jiangjiajun pushed a commit that referenced this pull request Sep 22, 2021
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

respond to review comments

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
jiangjiajun pushed a commit that referenced this pull request Sep 22, 2021
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

* start dtos

* wip depth_to_space

* dtos ident

Co-authored-by: Matthew <mbrookhart@octoml.ai>
Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.