Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherry-picks for 1.16.1 release #17741

Merged
merged 23 commits into from
Oct 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2ea88d1
Move python tests to docker
snnn Sep 28, 2023
404d5dc
Fix: Fail to skip disabledmodel in winml (#17728)
mszhanyi Sep 28, 2023
5a03aab
Move dotnet build and test into docker in Linux CPU CI (#17417)
mszhanyi Sep 7, 2023
e8a4c52
Run Nuget_Test_Linux_GPU in container (#17452)
mszhanyi Sep 8, 2023
8398346
Run Final_Jar_Testing_Linux_GPU in docker (#17533)
mszhanyi Sep 15, 2023
8dbf7d8
TreeEnsemble speed up (#17449)
adityagoel4512 Sep 12, 2023
da3bb02
Remove onnxruntime extensions from list of gitmodules (#17615)
nums11 Sep 20, 2023
9ad905c
Include onnxruntime_float16.h in the package. (#17637)
pranavsharma Sep 21, 2023
7ef4a1a
Fix static quantization for QDQ and Percentile distribution (#17649)
xadupre Sep 25, 2023
7fd077d
[TensorRT EP] Back out the PerThreadContext (#17690)
chilo-ms Sep 26, 2023
148b42f
Update nodejs to 18.x (#17657)
snnn Sep 25, 2023
00d5657
Update linux-wasm-ci.yml: remove the ln command (#17735)
snnn Sep 29, 2023
51239e0
Update version
snnn Sep 29, 2023
efe40aa
change dnf to yum
snnn Sep 29, 2023
7c18c7e
Fix a merge conflict in linux-ci-pipeline.yml
snnn Sep 29, 2023
1a0fd1f
Add microsoft yum repo
snnn Sep 29, 2023
6b7f160
Fix API 16's marker (#17640)
snnn Sep 21, 2023
a21df33
Upgrade sympy (#17639)
snnn Sep 21, 2023
f3a96ce
Increase the version number in onnxruntime_c_api.cc
snnn Sep 29, 2023
dd1021a
Fix Attention Runtime Error for CLIP model (#17729)
tianleiwu Sep 28, 2023
0161115
🐛 Bugfix win del file err (#17697)
trajepl Sep 26, 2023
6e3c907
Fix react native load from Uint8Array buffer bug (#17739)
YUNQIUGUO Sep 30, 2023
11fe4d0
Add copyright header
snnn Oct 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@
path = cmake/external/emsdk
url = https://github.com/emscripten-core/emsdk.git
branch = 3.1.44
[submodule "cmake/external/onnxruntime-extensions"]
path = cmake/external/onnxruntime-extensions
url = https://github.com/microsoft/onnxruntime-extensions.git
2 changes: 1 addition & 1 deletion VERSION_NUMBER
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.16.0
1.16.1
5 changes: 5 additions & 0 deletions docs/python/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ For more information on ONNX Runtime, please see `aka.ms/onnxruntime <https://ak
Changes
-------

1.16.1
^^^^^^

Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.16.1

1.16.0
^^^^^^

Expand Down
2 changes: 1 addition & 1 deletion js/common/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
4 changes: 2 additions & 2 deletions js/common/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/common/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"license": "MIT",
"type": "module",
"name": "onnxruntime-common",
"version": "1.16.0",
"version": "1.16.1",
"repository": {
"url": "https://github.com/Microsoft/onnxruntime.git",
"type": "git"
Expand Down
2 changes: 1 addition & 1 deletion js/node/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
6 changes: 3 additions & 3 deletions js/node/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/node/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
3
]
},
"version": "1.16.0",
"version": "1.16.1",
"dependencies": {
"onnxruntime-common": "file:../common"
},
Expand Down
4 changes: 3 additions & 1 deletion js/react_native/lib/backend.ts
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,14 @@ class OnnxruntimeSessionHandler implements SessionHandler {
let results: Binding.ModelLoadInfoType;
// load a model
if (typeof this.#pathOrBuffer === 'string') {
// load model from model path
results = await this.#inferenceSession.loadModel(normalizePath(this.#pathOrBuffer), options);
} else {
// load model from buffer
if (!this.#inferenceSession.loadModelFromBlob) {
throw new Error('Native module method "loadModelFromBlob" is not defined');
}
const modelBlob = jsiHelper.storeArrayBuffer(this.#pathOrBuffer);
const modelBlob = jsiHelper.storeArrayBuffer(this.#pathOrBuffer.buffer);
results = await this.#inferenceSession.loadModelFromBlob(modelBlob, options);
}
// resolve promise if onnxruntime session is successfully created
Expand Down
2 changes: 1 addition & 1 deletion js/react_native/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
2 changes: 1 addition & 1 deletion js/react_native/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"registry": "https://registry.npmjs.org/"
},
"source": "lib/index",
"version": "1.16.0",
"version": "1.16.1",
"main": "dist/commonjs/index",
"homepage": "https://github.com/microsoft/onnxruntime/blob/main/js/react_native/README.md",
"files": [
Expand Down
2 changes: 1 addition & 1 deletion js/react_native/yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5188,7 +5188,7 @@ onetime@^5.1.0, onetime@^5.1.2:
mimic-fn "^2.1.0"

"onnxruntime-common@file:../common":
version "1.16.0"
version "1.16.1"

open@^6.2.0:
version "6.4.0"
Expand Down
2 changes: 1 addition & 1 deletion js/web/lib/version.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
// This file is generated by /js/scripts/update-version.ts
// Do not modify file content manually.

export const version = '1.16.0';
export const version = '1.16.1';
6 changes: 3 additions & 3 deletions js/web/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"type": "git"
},
"author": "fs-eire",
"version": "1.16.0",
"version": "1.16.1",
"jsdelivr": "dist/ort.min.js",
"dependencies": {
"flatbuffers": "^1.12.0",
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
For more information on ONNX Runtime, please see `aka.ms/onnxruntime <https://aka.ms/onnxruntime/>`_
or the `Github project <https://github.com/microsoft/onnxruntime/>`_.
"""
__version__ = "1.16.0"
__version__ = "1.16.1"
__author__ = "Microsoft"

# we need to do device version validation (for example to check Cuda version for an onnxruntime-training package).
Expand Down
42 changes: 22 additions & 20 deletions onnxruntime/contrib_ops/cuda/bert/attention.cc
Original file line number Diff line number Diff line change
Expand Up @@ -140,27 +140,29 @@ Status Attention<T>::ComputeInternal(OpKernelContext* context) const {
#endif

if (!use_flash_attention) {
if (is_unidirectional_ && enable_fused_causal_attention_) { // GPT
// GPT fused kernels requires left side padding. mask can be:
// none (no padding), 1D sequence lengths or 2d mask.
// Fused kernels don't support different sequence lengths of q and kv, so only apply to the first token
// where past state is empty.
bool is_mask_2d_key_padding = parameters.mask_type == AttentionMaskType::MASK_2D_KEY_PADDING;
bool use_causal_fused_runner = (nullptr == mask_index || is_mask_1d_seq_len || is_mask_2d_key_padding) &&
nullptr == relative_position_bias &&
parameters.past_sequence_length == 0 &&
parameters.hidden_size == parameters.v_hidden_size &&
FusedMHARunnerFP16v2::is_supported(sm, parameters.head_size, sequence_length,
enable_trt_flash_attention_, true);
if (use_causal_fused_runner) {
// Here we assume that num_heads, head_size and is_unidirectional does not change for an Attention node.
if (nullptr == fused_fp16_runner_.get()) {
fused_fp16_runner_ = FusedMHARunnerFP16v2::Create(num_heads_, parameters.head_size, sm, is_unidirectional_,
enable_trt_flash_attention_, parameters.scale);
if (is_unidirectional_) { // GPT
if (enable_fused_causal_attention_) {
// GPT fused kernels requires left side padding. mask can be:
// none (no padding), 1D sequence lengths or 2d mask.
// Fused kernels don't support different sequence lengths of q and kv, so only apply to the first token
// where past state is empty.
bool is_mask_2d_key_padding = parameters.mask_type == AttentionMaskType::MASK_2D_KEY_PADDING;
bool use_causal_fused_runner = (nullptr == mask_index || is_mask_1d_seq_len || is_mask_2d_key_padding) &&
nullptr == relative_position_bias &&
parameters.past_sequence_length == 0 &&
parameters.hidden_size == parameters.v_hidden_size &&
FusedMHARunnerFP16v2::is_supported(sm, parameters.head_size, sequence_length,
enable_trt_flash_attention_, true);
if (use_causal_fused_runner) {
// Here we assume that num_heads, head_size and is_unidirectional does not change for an Attention node.
if (nullptr == fused_fp16_runner_.get()) {
fused_fp16_runner_ = FusedMHARunnerFP16v2::Create(num_heads_, parameters.head_size, sm, is_unidirectional_,
enable_trt_flash_attention_, parameters.scale);
}

// Here we assume all causal kernels can be loaded into shared memory. TODO: add a function to check.
fused_runner = fused_fp16_runner_.get();
}

// Here we assume all causal kernels can be loaded into shared memory. TODO: add a function to check.
fused_runner = fused_fp16_runner_.get();
}
} else { // BERT
bool use_fused_runner = !disable_fused_self_attention_ &&
Expand Down
49 changes: 28 additions & 21 deletions onnxruntime/core/providers/cpu/ml/tree_ensemble_aggregator.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,31 +64,38 @@ enum MissingTrack : uint8_t {
kFalse = 0
};

template <typename T>
struct TreeNodeElement;

template <typename T>
union PtrOrWeight {
TreeNodeElement<T>* ptr;
struct WeightData {
int32_t weight;
int32_t n_weights;
} weight_data;
};

template <typename T>
struct TreeNodeElement {
int feature_id;

// Stores the node threshold or the weights if the tree has one target.
T value_or_unique_weight;

// onnx specification says hitrates is used to store information about the node,
// The onnx specification says hitrates is used to store information about the node,
// but this information is not used for inference.
// T hitrates;

// True node, false node are obtained by computing `this + truenode_inc_or_first_weight`,
// `this + falsenode_inc_or_n_weights` if the node is not a leaf.
// In case of a leaf, these attributes are used to indicate the position of the weight
// in array `TreeEnsembleCommon::weights_`. If the number of targets or classes is one,
// the weight is also stored in `value_or_unique_weight`.
// This implementation assumes a tree has less than 2^31 nodes,
// and the total number of leave in the set of trees is below 2^31.
// A node cannot point to itself.
int32_t truenode_inc_or_first_weight;
// In case of a leaf, the following attribute indicates the number of weights
// in array `TreeEnsembleCommon::weights_`. If not a leaf, it indicates
// `this + falsenode_inc_or_n_weights` is the false node.
// A node cannot point to itself.
int32_t falsenode_inc_or_n_weights;
// PtrOrWeight acts as a tagged union, with the "tag" being whether the node is a leaf or not (see `is_not_leaf`).

// If it is not a leaf, it is a pointer to the true child node when traversing the decision tree. The false branch is
// always 1 position away from the TreeNodeElement in practice in `TreeEnsembleCommon::nodes_` so it is not stored.

// If it is a leaf, it contains `weight` and `n_weights` attributes which are used to indicate the position of the
// weight in array `TreeEnsembleCommon::weights_`. If the number of targets or classes is one, the weight is also
// stored in `value_or_unique_weight`.
PtrOrWeight<T> truenode_or_weight;
uint8_t flags;

inline NODE_MODE mode() const { return NODE_MODE(flags & 0xF); }
Expand Down Expand Up @@ -189,8 +196,8 @@ class TreeAggregatorSum : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
ORT_ENFORCE(it->i < (int64_t)predictions.size());
predictions[onnxruntime::narrow<size_t>(it->i)].score += it->value;
predictions[onnxruntime::narrow<size_t>(it->i)].has_score = 1;
Expand Down Expand Up @@ -292,8 +299,8 @@ class TreeAggregatorMin : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
predictions[onnxruntime::narrow<size_t>(it->i)].score =
(!predictions[onnxruntime::narrow<size_t>(it->i)].has_score || it->value < predictions[onnxruntime::narrow<size_t>(it->i)].score)
? it->value
Expand Down Expand Up @@ -349,8 +356,8 @@ class TreeAggregatorMax : public TreeAggregator<InputType, ThresholdType, Output
void ProcessTreeNodePrediction(InlinedVector<ScoreValue<ThresholdType>>& predictions,
const TreeNodeElement<ThresholdType>& root,
gsl::span<const SparseValue<ThresholdType>> weights) const {
auto it = weights.begin() + root.truenode_inc_or_first_weight;
for (int32_t i = 0; i < root.falsenode_inc_or_n_weights; ++i, ++it) {
auto it = weights.begin() + root.truenode_or_weight.weight_data.weight;
for (int32_t i = 0; i < root.truenode_or_weight.weight_data.n_weights; ++i, ++it) {
predictions[onnxruntime::narrow<size_t>(it->i)].score =
(!predictions[onnxruntime::narrow<size_t>(it->i)].has_score || it->value > predictions[onnxruntime::narrow<size_t>(it->i)].score)
? it->value
Expand Down
Loading