-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged ONNX decoder next steps #784
Labels
Comments
fxmarty
added
onnxruntime
Related to ONNX Runtime
onnx
Related to the ONNX export
labels
Feb 15, 2023
Hi @un-certainty , yes if you are using CUDAExecutionProvider, using IO Binding is probably helpful. I don't have a proper benchmark at hand though.
I would say it could, yes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request
The PR #647 was merged that adds support for merged without/with past decoder as a single ONNX file, along with inference in ORTModelForCausalLM.
Some key steps are still remaining:
2023-02-10 16:29:24.868007832 [W:onnxruntime:, graph.cc:3487 CleanUnusedInitializersAndNodeArgs] Removing initializer '/transformer/h.4/attn/Constant_18_output_0'. It is not used by any node and should be removed from the model.
, tracked inCleanUnusedInitializersAndNodeArgs
warnings are printed only with subgraphs microsoft/onnxruntime#14694bloom
that is currently uglycodegen
does not support-with-past
intasks.py
Motivation
Reduce memory usage
Your contribution
/
The text was updated successfully, but these errors were encountered: