v2.0.0-rc.9 #319

decahedron1 · 2024-11-21T20:47:00Z

decahedron1
Nov 21, 2024
Maintainer

🌴 Undo The Flattening (`d4f82fc`)

A previous ort release 'flattened' all exports, such that everything was exported at the crate root - ort::{TensorElementType, Session, Value}. This was done at a time when ort didn't export much, but now it exports a lot, so this was leading to some big, ugly use blocks.

rc.9 now has most exports behind their respective modules - Session is now imported as ort::session::Session, Tensor as ort::value::Tensor, etc. rust-analyzer and some quick searches on docs.rs can help you find the right paths to import.

📦 Tensor `extract` optimization (`1dbad54`)

Previously, calling any of the extract_tensor_* methods would have to call back to ONNX Runtime to determine the value's ValueType to ensure it was OK to extract. This involved a lot of FFI calls and a few allocations which could have a notable performance impact in hot loops.

Since a value's type never changes after it is created, the ValueType is now created when the Value is constructed (i.e. via Tensor::from_array or returned from a session). This makes extract_tensor_* a lot cheaper!

Note that this does come with some breaking changes:

Raw tensor extract methods return &[i64] for their dimensions instead of Vec<i64>.
Value::dtype() and Tensor::memory_info() now return &ValueType and &MemoryInfo respectively, instead of their non-borrowed counterparts.
ValueType::Tensor now has an extra field for symbolic dimensions, dimension_symbols, so you might have to update matches on ValueType.

🚥 Threading management (`87577ef`)

2.0.0-rc.9 introduces a new trait: ThreadManager. This allows you to define custom thread create & join functions for session & environment thread pools! See the thread_manager.rs test for an example of how to create your own ThreadManager and apply it to a session, or an environment's GlobalThreadPoolOptions (previously EnvironmentGlobalThreadPoolOptions).

Additionally, sessions may now opt out of the environment's global thread pool if one is configured.

🧠 Shape inference for custom operators (`87577ef`)

ort now provides ShapeInferenceContext, an interface for custom operators to provide a hint to ONNX Runtime about the shape of the operator's output tensors based on its inputs, which may open the doors to memory optimizations.

See the updated custom_operators.rs example to see how it works.

📃 Session output refactor (`8a16adb`)

SessionOutputs has been slightly refactored to reduce memory usage and slightly increase performance. Most notably, it no longer derefs to a &BTreeMap.

The new SessionOutputs interface closely mirrors BTreeMap's API, so most applications require no changes unless you were explicitly dereferencing to a &BTreeMap.

🛠️ LoRA Adapters (`d877fb3`)

ONNX Runtime v1.20.0 introduces a new Adapter format for supporting LoRA-like weight adapters, and now ort has it too!

An Adapter essentially functions as a map of tensors, loaded from disk or memory and copied to a device (typically whichever device the session resides on). When you add an Adapter to RunOptions, those tensors are automatically added as inputs (except faster, because they don't need to be copied anywhere!)

With some modification to your ONNX graph, you can add LoRA layers using optional inputs which Adapter can then override. (Hopefully ONNX Runtime will provide some documentation on how this can be done soon, but until then, it's ready to use in ort!)

let model = Session::builder()?.commit_from_file("tests/data/lora_model.onnx")?;
let lora = Adapter::from_file("tests/data/adapter.orl", None)?;

let mut run_options = RunOptions::new()?;
run_options.add_adapter(&lora)?;

let outputs = model.run_with_options(ort::inputs![Tensor::<f32>::from_array(([4, 4], vec![1.0; 16]))?]?, &run_options)?;

🗂️ Prepacked weights (`87577ef`)

PrepackedWeights allows multiple sessions to share the same weights across multiple sessions. If you create multiple Sessions from one model file, they can all share the same memory!

Currently, ONNX Runtime only supports prepacked weights for the CPU execution provider.

‼️ Dynamic dimension overrides (`87577ef`)

You can now override dynamic dimensions in a graph using SessionBuilder::with_dimension_override, allowing ONNX Runtime to do more optimizations.

🪶 Customizable workload type (`87577ef`)

Not all workloads need full performance all the time! If you're using ort to perform background tasks, you can now set a session's workload type to prioritize either efficiency (by lowering scheduling priority or utilizing more efficient CPU cores on some architectures), or performance (the default).

let session = Session::builder()?.commit_from_file("tests/data/upsample.onnx")?;
session.set_workload_type(WorkloadType::Efficient)?;

Other features

28e00e3 Update to ONNX Runtime v1.20.0.
552727e Expose the ortsys! macro.
- Note that this commit also made ort::api() return &ort_sys::OrtApi instead of NonNull<ort_sys::OrtApi>.
82dcf84 Add AsPointer trait.
- Structs that previously had a ptr() method now have an AsPointer implementation instead.
b51f60c Add config entries to RunOptions.
67fe38c Introduce the ORT_CXX_STDLIB environment variable (mirroring CXXSTDLIB) to allow changing the C++ standard library ort links to.

Fixes

c1c736b Fix ValueRef & ValueRefMut leaking value memory.
2628378 Query MemoryInfo's DeviceType instead of its allocation device to determine whether Tensors can be extracted.
e220795 Allow ORT_PREFER_DYNAMIC_LINK to work even when cuda or tensorrt are enabled.
1563c13 Add missing downcast implementations for Sequence<T>.
Returned Ferris to the docs.rs page 🦀

If you have any questions about this release, we're here to help:

Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support ort as well, consider contributing on Open Collective 💖

🩷💜🩷💜

This discussion was created from the release v2.0.0-rc.9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0-rc.9 #319

{{title}}

Replies: 0 comments

Select a reply

v2.0.0-rc.9 #319

decahedron1 Nov 21, 2024 Maintainer

🌴 Undo The Flattening (d4f82fc)

📦 Tensor extract optimization (1dbad54)

🚥 Threading management (87577ef)

🧠 Shape inference for custom operators (87577ef)

📃 Session output refactor (8a16adb)

🛠️ LoRA Adapters (d877fb3)

🗂️ Prepacked weights (87577ef)

‼️ Dynamic dimension overrides (87577ef)

🪶 Customizable workload type (87577ef)

Other features

Fixes

Replies: 0 comments

decahedron1
Nov 21, 2024
Maintainer

🌴 Undo The Flattening (`d4f82fc`)

📦 Tensor `extract` optimization (`1dbad54`)

🚥 Threading management (`87577ef`)

🧠 Shape inference for custom operators (`87577ef`)

📃 Session output refactor (`8a16adb`)

🛠️ LoRA Adapters (`d877fb3`)

🗂️ Prepacked weights (`87577ef`)

‼️ Dynamic dimension overrides (`87577ef`)

🪶 Customizable workload type (`87577ef`)