Releases: microsoft/onnxruntime-genai
Releases · microsoft/onnxruntime-genai
v0.4.0
Release Notes
- Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
- Support to build already-quantized models that were quantized with AWQ or GPTQ
- Performance improvements for Intel and Arm CPU
- Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)
v0.3.0
Release Notes
- Phi-3 Vision model support for DML EP.
- Addressed DML memory leak issue and crashes on long prompts.
- Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
- Added the import lib for windows C API package.
- Addressed a bug with
get_output('logits')
so that it returns the logits for the entire prompt and not for the last generated token. - Addressed a bug with querying the device type of the model so that it won't crash.
- Added NetStandard 2.0 compatibility.
ONNX Runtime GenAI v0.3.0-rc2
Release Notes
- Added support for the Phi-3-Vision model.
- Added support for the Phi-3-Small model.
- Removed usage of
std::filesystem
to avoid runtime issues when loading incompatible symbols from stdc++ and stdc++fs.