Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP #10260

Merged
merged 35 commits into from
Feb 15, 2022

Conversation

vvchernov
Copy link
Contributor

@vvchernov vvchernov commented Jan 12, 2022

Rename all mentions of STVM to TVM in the onnx runtime codebase.

Corrections were done after PR#10241 and PR#10211 were merged.

Notes:

  1. Nuphar and TVM EPs building were separated in build and cmake files for convenient further work.
  2. New key is used for build TVM EP +Cuda to avoid confusion with CUDA EP. Adding the new key is associated with the following --use_cuda usage issues:
    Using the --use_cuda flag means that when building a project, you need to build CUDA EP. This is not the right approach if we only want to work with TVM EP. Building CUDA EP is a separate flow that requires its own nuances and dependencies, which are redundant if we only need TVM EP.
    At the build.py script level, there are checks for --cuda_home and --cudnn_home flags, which are not needed for TVM. Also, setting the --use_cuda flag turns on a lot of additional logic when running build.py that is not needed for the TVM EP. In order to disable this additional logic, it is necessary to extend conditions if args.use_cuda: to if args.use_cuda and not args.use_tvm in many places. This is not conceptually correct, since these changes for TVM EP need to be made in methods that are specific to CUDA EP.
    Additionally, there are problems using TVM EP via PYTHONPATH and the wheel package. This is because setup.py only supports one EP at a time. If we set --use_cuda and --use_tvm flags, then only one wheel package for CUDA EP will be built, because this is how the logic of working with providers in setup.py is arranged. Also, the condition for the _ld_preload.py extension for TVM EP will not be fulfilled, with the help of which the
  3. If input shapes are unset and dynamic the ONNX runtime error is raised instead of warning and automatical inferred of dynamic dimension to 1.
  4. There are two independent structures with the same name (TVMFuncState) in TVM EP internal code and in tests for TVM prepared by NUPHAR.

Hello @xadupre, can you recheck our updates after your PR?

@vvchernov vvchernov changed the title [TVM EP] Rename Standalone TVM (STV) Execution Provider to TVM EP WIP: [TVM EP] Rename Standalone TVM (STV) Execution Provider to TVM EP Jan 12, 2022
@vvchernov vvchernov changed the title WIP: [TVM EP] Rename Standalone TVM (STV) Execution Provider to TVM EP WIP: [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP Jan 12, 2022
@tmoreau89
Copy link
Contributor

@xadupre let us know if you have feedback on the changes we applied to the Nuphar TVM strings that conflicted.

"convert layout to NHWC: 0\n",
"input tensor names: \n",
"input tensor shapes: \n",
"Build TVM graph executor\n",
"****************** Success! ******************\n"
"/home/agladyshev/workspace/tvm/python/tvm/relay/build_module.py:414: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same warning as the removed one but is there any way to remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be cleaned up; CC @KJlaccHoeUM9l and @vvchernov

@@ -59,7 +59,7 @@ option(onnxruntime_BUILD_OBJC "Build Objective-C library" OFF)
option(onnxruntime_USE_PREINSTALLED_EIGEN "Use pre-installed EIGEN. Need to provide eigen_SOURCE_PATH if turn this on." OFF)
option(onnxruntime_BUILD_BENCHMARKS "Build ONNXRuntime micro-benchmarks" OFF)

option(onnxruntime_USE_TVM "Build TVM for code generation." OFF)
option(onnxruntime_NUPHAR_USE_TVM "Build TVM for code generation." OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why onnxruntime_NUPHAR_USE_TVM and not onnxruntime_USE_NUPHAR_TVM?

Copy link
Contributor Author

@vvchernov vvchernov Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xadupre it does not metter for me, it separated 'USE_TVM' after 'USE_STVM' renaming and 'USE_TVM' used by Nuphar. if you wish I can redo it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other similar constant starts with onnxruntime_USE.... It seems better to follow the same pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. done

xadupre
xadupre previously approved these changes Jan 14, 2022
@vvchernov vvchernov changed the title WIP: [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP Jan 16, 2022
@tmoreau89
Copy link
Contributor

@xadupre it appears the CI are blocked, would you mind unblocking? thank you.

@edgchen1
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@edgchen1
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@vvchernov
Copy link
Contributor Author

Hello @xadupre and @edgchen1, I've fixed issues related to CI tests failure, can you review and restart the tests

@tmoreau89
Copy link
Contributor

Recommendation is to land this PR after #10211 is merged

@tmoreau89
Copy link
Contributor

@xadupre and @edgchen1 - we have finished rebasing and fixing some of the build issues we uncovered after #10211 merged. Please see the updated PR description.

Would it be possible to re-trigger the CI and get an additional set of reviews? Thanks

@xadupre
Copy link
Member

xadupre commented Feb 4, 2022

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

cmake/CMakeLists.txt Outdated Show resolved Hide resolved
docs/TVM_EP.md Outdated
apt-get install -y python3 python3-dev python3-pip python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev llvm-12
pip3 install numpy decorator attrs
```

Clone this repo.
If you want to use the `GPU` then make sure you have installed NVidia driver and CUDA Toolkit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TVM has specific requirement on the version, is there a page the documentation can link to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @xadupre!
There are no special version requirements.
There are also no additional links to documentation. CUDA is an external dependency in this context, so it is up to the user to decide how they will deliver this dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally we're using CUDA 10, but moving over to CUDA 11.1 and ensuring this leads to no performance regressions.

pip3 uninstall onnxruntime onnxruntime-stvm tvm -y
whl_path=$(find ./build/Linux/Release/dist -name "*.whl")
python3 -m pip uninstall tvm -y
whl_path=$(find ./build/<OS_NAME>/Release/_deps/tvm-src/python/dist -name "*.whl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. linux or os_name is not mandatory. It can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume that ONNX runtime is built by build.sh (see TVM_EP.md). Inside the latter the path with OS_DIR is constructed due to this linux or os_name is mandatory

docs/TVM_EP.md Outdated Show resolved Hide resolved
tools/ci_build/build.py Outdated Show resolved Hide resolved
@tmoreau89
Copy link
Contributor

@xadupre please let us know if you have remaining outstanding questions

@vvchernov
Copy link
Contributor Author

Hello @xadupre , could you recheck our changes after the last review and restart CI tests?

@vvchernov
Copy link
Contributor Author

Good evening @xadupre. PR #10524 was merged, I rebased our branch onto master, please start CI tests

@xadupre
Copy link
Member

xadupre commented Feb 11, 2022

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@xadupre
Copy link
Member

xadupre commented Feb 11, 2022

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@vvchernov
Copy link
Contributor Author

PR #10534 and #10533 have the same CI failure. Waiting for fix by somebody

@tmoreau89
Copy link
Contributor

@xadupre - is the issue found in orttraining-amd-gpu-ci-pipeline a known issue? Currently blocked on getting the CI to go green; seems unrelated to this PR.

@edgchen1
Copy link
Contributor

/azp run orttraining-amd-gpu-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvchernov
Copy link
Contributor Author

Hello @xadupre! Please check last time and approve the PR

@xadupre xadupre merged commit 1cdc23a into microsoft:master Feb 15, 2022
@vvchernov vvchernov deleted the vc/rename branch February 25, 2022 08:49
@skottmckay skottmckay mentioned this pull request Apr 11, 2022
petersalas pushed a commit to octoml/onnxruntime that referenced this pull request Nov 7, 2022
…icrosoft#10260)

* update java API for STVM EP. Issue is from PR#10019

* use_stvm -> use_tvm

* rename stvm worktree

* STVMAllocator -> TVMAllocator

* StvmExecutionProviderInfo -> TvmExecutionProviderInfo

* stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict

* STVMRunner -> TVMRunner

* StvmExecutionProvider -> TvmExecutionProvider

* tvm::env_vars

* StvmProviderFactory -> TvmProviderFactory

* rename factory funcs

* StvmCPUDataTransfer -> TvmCPUDataTransfer

* small clean

* STVMFuncState -> TVMFuncState

* USE_TVM -> NUPHAR_USE_TVM

* USE_STVM -> USE_TVM

* python API: providers.stvm -> providers.tvm. clean TVM_EP.md

* clean build scripts #1

* clean build scripts, java frontend and others #2

* once more clean #3

* fix build of nuphar tvm test

* final transfer stvm namespace to onnxruntime::tvm

* rename stvm->tvm

* NUPHAR_USE_TVM -> USE_NUPHAR_TVM

* small fixes for correct CI tests

* clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files

* update CUDA support for TVM EP

* roll back CudaNN home check

* ERROR for not positive input shape dimension instead of WARNING

* update documentation for CUDA

* small corrections after review

* update GPU description

* update GPU description

* misprints were fixed

* cleaned up error msgs

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
(cherry picked from commit 1cdc23a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants