Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherry-picks for 1.16.1 release #17741

Merged
merged 23 commits into from
Oct 2, 2023
Merged

cherry-picks for 1.16.1 release #17741

merged 23 commits into from
Oct 2, 2023

Conversation

snnn
Copy link
Member

@snnn snnn commented Sep 29, 2023

No description provided.

snnn and others added 16 commits September 28, 2023 08:56
### Description
Move appending source name behind the ModifyNameIfDisabledTest

### Motivation and Context
In winml,  disabled test name doesn't include the model source name.
WinML job will be broken in the new image.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151451&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05

### Verified

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151691&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05
install dotnet 6.0 in the docker image.
move C# build and test into docker.

The Unit tests and Symbolic shape infer's migration will be in another
PR.
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results
### Description
This PR proposes a change that should speed up inference for the
TreeEnsemble* kernels. Previously, when traversing a decision tree, the
`TreeNodeElement` pointer would be incremented or decremented to the
appropriate child node - I assume this was because the
`truenode_inc_or_first_weight` and `falsenode_inc_or_n_weights` member
were overloaded for two purposes.

In this PR, we now assign the true branch pointer. We also initialise
`nodes_` in a pre-order traversal which means that the false branch's
position can be resolved statically and does not need to be stored.

I observe the following speed ups. The benchmarks used are derived from
those in https://github.com/siboehm/lleaves/tree/master/benchmarks and
the baseline is the main branch.

NYC Dataset
--------------
| Number of threads | Baseline | Pointer assignment | Pre-ordered
initialisation | Pointer assignment % improvement | Pre-ordered
initialisation % improvement |

|--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:|
| 1 | 176.539 | 155.709 | 145.119 | 11.7989 | 17.7976 |
| 4 | 59.9015 | 51.9652 | 50.0884 | 13.2488 | 16.382 |
| 8 | 34.5561 | 31.3024 | 28.2535 | 9.41581 | 18.2387 |

Airline Dataset
---------------

| Number of threads | Baseline | Pointer assignment | Pre-ordered
initialisation | Pointer assignment % improvement | Pre-ordered
initialisation % improvement |

|--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:|
| 1 | 2127.34 | 1389.7 | 920.373 | 34.6745 | 56.736 |
| 4 | 723.307 | 481.634 | 310.618 | 33.4122 | 57.0558 |
| 8 | 420.722 | 278.397 | 185.265 | 33.8286 | 55.9651 |

mtpl2 Dataset
--------------

| Number of threads | Baseline | Pointer assignment | Pre-ordered
initialisation | Pointer assignment % improvement | Pre-ordered
initialisation % improvement |

|--------------------:|-----------:|---------------------:|-----------------------------:|-----------------------------------:|-------------------------------------------:|
| 1 | 1143.62 | 1020.04 | 998.171 | 10.8055 | 13.0988 |
| 4 | 386.153 | 339.905 | 328.061 | 11.9764 | 14.3729 |
| 8 | 225.995 | 200.665 | 199.057 | 11.2084 | 13.4408 |

These were run using an M2 Pro with 16GB of RAM. All times are in
milliseconds and averages over 10 runs with a batch size of 100,000.

### Motivation and Context
Performance improvements.
The extensions submodule was removed in [this
PR](#17097) but not deleted
from the list of git modules. This causes breaks in code ingesting ORT
that references the git modules for an accurate list of submodules.

This change removes the extensions from the list of git modules to
resolve this issue.
### Description
Include onnxruntime_float16.h in the package.

### Motivation and Context
This was missed in the recently released 1.16 pkgs (except Nuget).
### Description
One quantization case was not covered by the current list of unit tests.
This PR adds a unit test to cover that case with the fix. It fixes the
issue #17619.



### Motivation and Context
Current TRT EP's PerthreadContext allows more than one IExecutionContext
instance to be created by one engine instance.
But, it's possible to hit an error that caused by TRT API
context.setBindingDimensions() in our TRT EP code
[here](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fproviders%2Ftensorrt%2Ftensorrt_execution_provider.cc%23L2775&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5EZoAoXgWFSuz%2BIRMH%2FXZaO%2BfKNP%2FZDZYEZg3W%2Ff30w%3D&reserved=0)
under the case of the input shape changes ( meaning engine being
rebuilt) with multithreading.
From the
[doc](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fdeeplearning%2Ftensorrt%2Fapi%2Fc_api%2Fclassnvinfer1_1_1_i_execution_context.html%23ada050e88320bcc40987b0acadc2ef962&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BmVZU5iLD97B3YBPdHZP7jOQ2dGoleI3R0mSMVgopG4%3D&reserved=0)
and the
[discussion](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FTensorRT%2Fissues%2F846&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c8v%2FK2UkQ%2FNbf8w1sHNDGsB2kxw4sSmkyQ2QuCs8Fs8%3D&reserved=0),
it seems we should have different OptimizationProfile for different
IExecutionContext which our current TRT EP doesn’t support regardless of
using PerThreadContext implementation.
Back out the PerThreadContext until we completely solve this issue.
1. Upgrade nodejs from 16.x to 18.x for Windows pipelines
2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs
nodejs from internet or local disk cache. But we already moved all Linux
tests to docker. So we do not need the installer anymore.
3. Remove some other unused code.
### Description
/usr/local/bin can only be modified by root.  This command seems unnecessary
@tianleiwu
Copy link
Contributor

Please help triage cherry-pick of 20f96fd to 1.16.1

snnn and others added 5 commits September 29, 2023 09:53
### Description
The condition check is not correct
```
if (is_unidirectional_ && enable_fused_causal_attention_) {  // GPT
}
else { // BERT
}
```

Change it to 
```
if (is_unidirectional_) {  // GPT
}
else { // BERT
}
```

Another walkaround is to enable fused causal attention by adding an
environment variable `ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1` before
running stable diffusion.

### Motivation and Context

Without the fix, optimized CLIP model of stable diffusion will encounter
error in running Attention node:

2023-09-24 16:15:31.206037898 [E:onnxruntime:,
sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned
while running Attention node. Name:'Attention_0' Status Message:
/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/mha_runner.cu:207
bool
onnxruntime::contrib::cuda::FusedMHARunnerFP16v2::mhaImpl::is_flash_attention(int)
const interface->mHasCausalMask == false was false.

Note that the bug has been there for a long time. It is just surfaced
since we recently added a fusion for CLIP, which will trigger the error.

We will add a comprehensive test for causal attention later to avoid
such corner cases.
### Description
<!-- Describe your changes. -->

Fix for this issue which raise the error of FileNotAccessd in windows
when the context of TemporaryDirectory finished.
#17627

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#17627
@snnn snnn marked this pull request as ready for review September 29, 2023 22:20
@snnn snnn requested a review from a team as a code owner September 29, 2023 22:20
### Description
<!-- Describe your changes. -->

Use `.buffer` of Uint8Array to get ArrayBuffer.

TODO: Add E2E React Native test case to cover JS level testing to avoid
future breakage.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#17732

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
tianleiwu
tianleiwu previously approved these changes Oct 1, 2023
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some shell scripts are missing copyright and license header.

@mayeut
Copy link
Contributor

mayeut commented Oct 2, 2023

@snnn, could you cherry pick #17716 please ?

@snnn snnn requested a review from tianleiwu October 2, 2023 21:19
@snnn snnn merged commit 264a740 into rel-1.16.1 Oct 2, 2023
178 of 194 checks passed
@snnn snnn deleted the snnn/rel-1.16.1 branch October 2, 2023 22:04
@snnn snnn mentioned this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.