v2.0.0-rc.9 #319
decahedron1
announced in
Announcements
v2.0.0-rc.9
#319
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🌴 Undo The Flattening (d4f82fc)
A previous
ort
release 'flattened' all exports, such that everything was exported at the crate root -ort::{TensorElementType, Session, Value}
. This was done at a time whenort
didn't export much, but now it exports a lot, so this was leading to some big, uglyuse
blocks.rc.9
now has most exports behind their respective modules -Session
is now imported asort::session::Session
,Tensor
asort::value::Tensor
, etc.rust-analyzer
and some quick searches on docs.rs can help you find the right paths to import.📦 Tensor
extract
optimization (1dbad54)Previously, calling any of the
extract_tensor_*
methods would have to call back to ONNX Runtime to determine the value'sValueType
to ensure it was OK to extract. This involved a lot of FFI calls and a few allocations which could have a notable performance impact in hot loops.Since a value's type never changes after it is created, the
ValueType
is now created when theValue
is constructed (i.e. viaTensor::from_array
or returned from a session). This makesextract_tensor_*
a lot cheaper!Note that this does come with some breaking changes:
&[i64]
for their dimensions instead ofVec<i64>
.Value::dtype()
andTensor::memory_info()
now return&ValueType
and&MemoryInfo
respectively, instead of their non-borrowed counterparts.ValueType::Tensor
now has an extra field for symbolic dimensions,dimension_symbols
, so you might have to updatematch
es onValueType
.🚥 Threading management (87577ef)
2.0.0-rc.9
introduces a new trait:ThreadManager
. This allows you to define custom thread create & join functions for session & environment thread pools! See thethread_manager.rs
test for an example of how to create your ownThreadManager
and apply it to a session, or an environment'sGlobalThreadPoolOptions
(previouslyEnvironmentGlobalThreadPoolOptions
).Additionally, sessions may now opt out of the environment's global thread pool if one is configured.
🧠 Shape inference for custom operators (87577ef)
ort
now providesShapeInferenceContext
, an interface for custom operators to provide a hint to ONNX Runtime about the shape of the operator's output tensors based on its inputs, which may open the doors to memory optimizations.See the updated
custom_operators.rs
example to see how it works.📃 Session output refactor (8a16adb)
SessionOutputs
has been slightly refactored to reduce memory usage and slightly increase performance. Most notably, it no longer derefs to a&BTreeMap
.The new
SessionOutputs
interface closely mirrorsBTreeMap
's API, so most applications require no changes unless you were explicitly dereferencing to a&BTreeMap
.🛠️ LoRA Adapters (d877fb3)
ONNX Runtime v1.20.0 introduces a new
Adapter
format for supporting LoRA-like weight adapters, and nowort
has it too!An
Adapter
essentially functions as a map of tensors, loaded from disk or memory and copied to a device (typically whichever device the session resides on). When you add anAdapter
toRunOptions
, those tensors are automatically added as inputs (except faster, because they don't need to be copied anywhere!)With some modification to your ONNX graph, you can add LoRA layers using optional inputs which
Adapter
can then override. (Hopefully ONNX Runtime will provide some documentation on how this can be done soon, but until then, it's ready to use inort
!)🗂️ Prepacked weights (87577ef)
PrepackedWeights
allows multiple sessions to share the same weights across multiple sessions. If you create multipleSession
s from one model file, they can all share the same memory!Currently, ONNX Runtime only supports prepacked weights for the CPU execution provider.
You can now override dynamic dimensions in a graph using
SessionBuilder::with_dimension_override
, allowing ONNX Runtime to do more optimizations.🪶 Customizable workload type (87577ef)
Not all workloads need full performance all the time! If you're using
ort
to perform background tasks, you can now set a session's workload type to prioritize either efficiency (by lowering scheduling priority or utilizing more efficient CPU cores on some architectures), or performance (the default).Other features
ortsys!
macro.ort::api()
return&ort_sys::OrtApi
instead ofNonNull<ort_sys::OrtApi>
.AsPointer
trait.ptr()
method now have anAsPointer
implementation instead.RunOptions
.ORT_CXX_STDLIB
environment variable (mirroringCXXSTDLIB
) to allow changing the C++ standard library ort links to.Fixes
ValueRef
&ValueRefMut
leaking value memory.MemoryInfo
'sDeviceType
instead of its allocation device to determine whetherTensor
s can be extracted.ORT_PREFER_DYNAMIC_LINK
to work even whencuda
ortensorrt
are enabled.Sequence<T>
.If you have any questions about this release, we're here to help:
#💬|ort-discussions
Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support
ort
as well, consider contributing on Open Collective 💖🩷💜🩷💜
This discussion was created from the release v2.0.0-rc.9.
Beta Was this translation helpful? Give feedback.
All reactions