-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent data copy in CreateSessionFromArray #8328
Comments
We are actively looking into this issue. @slevental, to better understand your issue, would you please give us a bit more context? For example, the ~60Mb memory Also if it is possible, could you share the model, such that we can measure memory consumption based on your real world scenario. |
@gwang-msft hey, thanks for getting back. This is an Apple limitation for extensions: when app uses more memory than 60Mb - it's getting killed by OS. This includes everything graphics/ui and internal data structures; One of the solutions that were possible is to mmap model (supported by TensorflowLite); I think it's something that would be useful for onnxruntime to have to reduce memory consumption; we tried to mmap the model and use CreateSessionFromArray as it can take an address and load model from there, but in the code, we found that onnxruntime copies this memory into the RES memory of the process - this defeats the purpose. Unfortunately, we cannot share the model, but it is just the semantics of the API. |
here is the code that does that:
|
I think if onnxruntime might provide mmap usage of the models (with ORT is based on flatbuffers this should be straightforward) that also would be a possible solution |
With the latest master branch, you may specify the session config option by calling this API onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h Lines 1021 to 1022 in 4c939e1
With this session config key onnxruntime/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h Line 70 in 4c939e1
You will have to ensure the validity of the input buffer throughout the lifetime of the inference session. Also for the model, could you share some stats of the model if the model itself cannot be shared, such as the rough size of the model and the initializer tensors, does the model contains nodes such as ConstantOfShape?, ... |
Feature Request
Is your feature request related to a problem? Please describe.
We are using OnnxRuntime on the mobile environment (For some mobile apps (extensions on iOS) there is a memory limitation ~60Mb, a process that uses more - gets killed by OS.);
In this environment using less ram is critical, so we had to investigate the onnxruntime memory usage patterns; we find out that it's impossible to use mmapped models in the onnxrumtime; even if mmaped on the client-side and passed to CreateSessionFromArray - the onnxruntime copies the memory into internal data structure;
System information
Describe the solution you'd like
To tackle this it's possible to use os file cache and mmap filed into memory (TensorFlowLite supports that for instance)
Describe alternatives you've considered
There is no alternative unfortunately; we only can migrate to another framework
The text was updated successfully, but these errors were encountered: