cherry-picks for 1.16.1 release #17741

snnn · 2023-09-29T14:44:21Z

No description provided.

### Description Move appending source name behind the ModifyNameIfDisabledTest ### Motivation and Context In winml, disabled test name doesn't include the model source name. WinML job will be broken in the new image. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151451&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05 ### Verified https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151691&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05

install dotnet 6.0 in the docker image. move C# build and test into docker. The Unit tests and Symbolic shape infer's migration will be in another PR.

### Description  ### Motivation and Context  ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results

### Description 1. Create a package test image based on [RedHat UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image) 2. Install TensorRT 8.6.1.6 in RedHat. (Ref. https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm) 3. Run Final_Jar_Testing_Linux_GPU in docker (base image: nvidia/cuda:11.8.0-cudnn8-devel-ubi8) ### Motivation and Context [AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470) ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069

### Description This PR proposes a change that should speed up inference for the TreeEnsemble* kernels. Previously, when traversing a decision tree, the `TreeNodeElement` pointer would be incremented or decremented to the appropriate child node - I assume this was because the `truenode_inc_or_first_weight` and `falsenode_inc_or_n_weights` member were overloaded for two purposes. In this PR, we now assign the true branch pointer. We also initialise `nodes_` in a pre-order traversal which means that the false branch's position can be resolved statically and does not need to be stored. I observe the following speed ups. The benchmarks used are derived from those in https://github.com/siboehm/lleaves/tree/master/benchmarks and the baseline is the main branch. NYC Dataset -------------- | Number of threads | Baseline | Pointer assignment | Pre-ordered initialisation | Pointer assignment % improvement | Pre-ordered initialisation % improvement | |--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:| | 1 | 176.539 | 155.709 | 145.119 | 11.7989 | 17.7976 | | 4 | 59.9015 | 51.9652 | 50.0884 | 13.2488 | 16.382 | | 8 | 34.5561 | 31.3024 | 28.2535 | 9.41581 | 18.2387 | Airline Dataset --------------- | Number of threads | Baseline | Pointer assignment | Pre-ordered initialisation | Pointer assignment % improvement | Pre-ordered initialisation % improvement | |--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:| | 1 | 2127.34 | 1389.7 | 920.373 | 34.6745 | 56.736 | | 4 | 723.307 | 481.634 | 310.618 | 33.4122 | 57.0558 | | 8 | 420.722 | 278.397 | 185.265 | 33.8286 | 55.9651 | mtpl2 Dataset -------------- | Number of threads | Baseline | Pointer assignment | Pre-ordered initialisation | Pointer assignment % improvement | Pre-ordered initialisation % improvement | |--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:| | 1 | 1143.62 | 1020.04 | 998.171 | 10.8055 | 13.0988 | | 4 | 386.153 | 339.905 | 328.061 | 11.9764 | 14.3729 | | 8 | 225.995 | 200.665 | 199.057 | 11.2084 | 13.4408 | These were run using an M2 Pro with 16GB of RAM. All times are in milliseconds and averages over 10 runs with a batch size of 100,000. ### Motivation and Context Performance improvements.

The extensions submodule was removed in [this PR](#17097) but not deleted from the list of git modules. This causes breaks in code ingesting ORT that references the git modules for an accurate list of submodules. This change removes the extensions from the list of git modules to resolve this issue.

### Description Include onnxruntime_float16.h in the package. ### Motivation and Context This was missed in the recently released 1.16 pkgs (except Nuget).

### Description One quantization case was not covered by the current list of unit tests. This PR adds a unit test to cover that case with the fix. It fixes the issue #17619. ### Motivation and Context

Current TRT EP's PerthreadContext allows more than one IExecutionContext instance to be created by one engine instance. But, it's possible to hit an error that caused by TRT API context.setBindingDimensions() in our TRT EP code [here](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fproviders%2Ftensorrt%2Ftensorrt_execution_provider.cc%23L2775&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5EZoAoXgWFSuz%2BIRMH%2FXZaO%2BfKNP%2FZDZYEZg3W%2Ff30w%3D&reserved=0) under the case of the input shape changes ( meaning engine being rebuilt) with multithreading. From the [doc](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fdeeplearning%2Ftensorrt%2Fapi%2Fc_api%2Fclassnvinfer1_1_1_i_execution_context.html%23ada050e88320bcc40987b0acadc2ef962&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BmVZU5iLD97B3YBPdHZP7jOQ2dGoleI3R0mSMVgopG4%3D&reserved=0) and the [discussion](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FTensorRT%2Fissues%2F846&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c8v%2FK2UkQ%2FNbf8w1sHNDGsB2kxw4sSmkyQ2QuCs8Fs8%3D&reserved=0), it seems we should have different OptimizationProfile for different IExecutionContext which our current TRT EP doesn’t support regardless of using PerThreadContext implementation. Back out the PerThreadContext until we completely solve this issue.

1. Upgrade nodejs from 16.x to 18.x for Windows pipelines 2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs nodejs from internet or local disk cache. But we already moved all Linux tests to docker. So we do not need the installer anymore. 3. Remove some other unused code.

### Description /usr/local/bin can only be modified by root. This command seems unnecessary

tianleiwu · 2023-09-29T16:43:39Z

Please help triage cherry-pick of 20f96fd to 1.16.1

AB#17015

### Description The condition check is not correct ``` if (is_unidirectional_ && enable_fused_causal_attention_) { // GPT } else { // BERT } ``` Change it to ``` if (is_unidirectional_) { // GPT } else { // BERT } ``` Another walkaround is to enable fused causal attention by adding an environment variable `ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1` before running stable diffusion. ### Motivation and Context Without the fix, optimized CLIP model of stable diffusion will encounter error in running Attention node: 2023-09-24 16:15:31.206037898 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Attention node. Name:'Attention_0' Status Message: /onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/mha_runner.cu:207 bool onnxruntime::contrib::cuda::FusedMHARunnerFP16v2::mhaImpl::is_flash_attention(int) const interface->mHasCausalMask == false was false. Note that the bug has been there for a long time. It is just surfaced since we recently added a fusion for CLIP, which will trigger the error. We will add a comprehensive test for causal attention later to avoid such corner cases.

### Description  Fix for this issue which raise the error of FileNotAccessd in windows when the context of TemporaryDirectory finished. #17627 ### Motivation and Context  #17627

### Description  Use `.buffer` of Uint8Array to get ArrayBuffer. TODO: Add E2E React Native test case to cover JS level testing to avoid future breakage. ### Motivation and Context  #17732 Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>

pranavsharma

Some shell scripts are missing copyright and license header.

onnxruntime/test/python/quantization/resnet_code.py

mayeut · 2023-10-02T17:29:03Z

@snnn, could you cherry pick #17716 please ?

snnn and others added 16 commits September 28, 2023 08:56

Move python tests to docker

2ea88d1

Move dotnet build and test into docker in Linux CPU CI (#17417)

5a03aab

install dotnet 6.0 in the docker image. move C# build and test into docker. The Unit tests and Symbolic shape infer's migration will be in another PR.

Include onnxruntime_float16.h in the package. (#17637)

9ad905c

### Description Include onnxruntime_float16.h in the package. ### Motivation and Context This was missed in the recently released 1.16 pkgs (except Nuget).

Fix static quantization for QDQ and Percentile distribution (#17649)

7ef4a1a

### Description One quantization case was not covered by the current list of unit tests. This PR adds a unit test to cover that case with the fix. It fixes the issue #17619. ### Motivation and Context

Update linux-wasm-ci.yml: remove the ln command (#17735)

00d5657

### Description /usr/local/bin can only be modified by root. This command seems unnecessary

Update version

51239e0

change dnf to yum

efe40aa

Fix a merge conflict in linux-ci-pipeline.yml

7c18c7e

Add microsoft yum repo

1a0fd1f

snnn and others added 5 commits September 29, 2023 09:53

Fix API 16's marker (#17640)

6b7f160

Upgrade sympy (#17639)

a21df33

AB#17015

Increase the version number in onnxruntime_c_api.cc

f3a96ce

snnn marked this pull request as ready for review September 29, 2023 22:20

snnn requested a review from a team as a code owner September 29, 2023 22:20

snnn requested review from pranavsharma, faxu, jywu-msft, chilo-ms and tianleiwu September 29, 2023 22:37

tianleiwu previously approved these changes Oct 1, 2023

View reviewed changes

pranavsharma reviewed Oct 2, 2023

View reviewed changes

onnxruntime/test/python/quantization/resnet_code.py Show resolved Hide resolved

Add copyright header

11fe4d0

snnn dismissed tianleiwu’s stale review via 11fe4d0 October 2, 2023 17:25

pranavsharma approved these changes Oct 2, 2023

View reviewed changes

snnn requested a review from tianleiwu October 2, 2023 21:19

jchen351 approved these changes Oct 2, 2023

View reviewed changes

snnn merged commit 264a740 into rel-1.16.1 Oct 2, 2023
178 of 194 checks passed

snnn deleted the snnn/rel-1.16.1 branch October 2, 2023 22:04

snnn mentioned this pull request Oct 30, 2023

snnn/p202310301 #18169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cherry-picks for 1.16.1 release #17741

cherry-picks for 1.16.1 release #17741

snnn commented Sep 29, 2023

tianleiwu commented Sep 29, 2023

pranavsharma left a comment

mayeut commented Oct 2, 2023

cherry-picks for 1.16.1 release #17741

cherry-picks for 1.16.1 release #17741

Conversation

snnn commented Sep 29, 2023

tianleiwu commented Sep 29, 2023

pranavsharma left a comment

Choose a reason for hiding this comment

mayeut commented Oct 2, 2023