Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent data copy in CreateSessionFromArray #8328

Closed
slevental opened this issue Jul 8, 2021 · 5 comments
Closed

Prevent data copy in CreateSessionFromArray #8328

slevental opened this issue Jul 8, 2021 · 5 comments
Assignees
Labels
core runtime issues related to core runtime feature request request for unsupported feature or enhancement platform:mobile issues related to ONNX Runtime mobile; typically submitted using template

Comments

@slevental
Copy link

slevental commented Jul 8, 2021

Feature Request

Is your feature request related to a problem? Please describe.
We are using OnnxRuntime on the mobile environment (For some mobile apps (extensions on iOS) there is a memory limitation ~60Mb, a process that uses more - gets killed by OS.);

In this environment using less ram is critical, so we had to investigate the onnxruntime memory usage patterns; we find out that it's impossible to use mmapped models in the onnxrumtime; even if mmaped on the client-side and passed to CreateSessionFromArray - the onnxruntime copies the memory into internal data structure;

System information

  • ONNX Runtime version (you are using): 1.8.0 (C api)

Describe the solution you'd like
To tackle this it's possible to use os file cache and mmap filed into memory (TensorFlowLite supports that for instance)

Describe alternatives you've considered
There is no alternative unfortunately; we only can migrate to another framework

@kit1980 kit1980 added core runtime issues related to core runtime feature request request for unsupported feature or enhancement labels Jul 8, 2021
@guoyu-wang guoyu-wang added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Jul 15, 2021
@guoyu-wang
Copy link
Contributor

guoyu-wang commented Jul 16, 2021

We are actively looking into this issue.

@slevental, to better understand your issue, would you please give us a bit more context? For example, the ~60Mb memory
limitation for your App, is this an Apple policy? Does the App gets killed when the peak working set exceeds the limitation?

Also if it is possible, could you share the model, such that we can measure memory consumption based on your real world scenario.

@slevental
Copy link
Author

@gwang-msft hey, thanks for getting back.

This is an Apple limitation for extensions: when app uses more memory than 60Mb - it's getting killed by OS. This includes everything graphics/ui and internal data structures; One of the solutions that were possible is to mmap model (supported by TensorflowLite); I think it's something that would be useful for onnxruntime to have to reduce memory consumption; we tried to mmap the model and use CreateSessionFromArray as it can take an address and load model from there, but in the code, we found that onnxruntime copies this memory into the RES memory of the process - this defeats the purpose.

Unfortunately, we cannot share the model, but it is just the semantics of the API.

@slevental
Copy link
Author

here is the code that does that:

std::copy_n(reinterpret_cast<const uint8_t*>(model_data), model_data_len, ort_format_model_bytes_.data());

@slevental
Copy link
Author

I think if onnxruntime might provide mmap usage of the models (with ORT is based on flatbuffers this should be straightforward) that also would be a possible solution

@guoyu-wang
Copy link
Contributor

guoyu-wang commented Jul 27, 2021

With the latest master branch, you may specify the session config option by calling this API

ORT_API2_STATUS(AddSessionConfigEntry, _Inout_ OrtSessionOptions* options,
_In_z_ const char* config_key, _In_z_ const char* config_value);

With this session config key

static const char* const kOrtSessionOptionsConfigUseORTModelBytesDirectly = "session.use_ort_model_bytes_directly";
and value "1" to use the input buffer directly and avoid copy,

You will have to ensure the validity of the input buffer throughout the lifetime of the inference session.

Also for the model, could you share some stats of the model if the model itself cannot be shared, such as the rough size of the model and the initializer tensors, does the model contains nodes such as ConstantOfShape?, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime feature request request for unsupported feature or enhancement platform:mobile issues related to ONNX Runtime mobile; typically submitted using template
Projects
None yet
Development

No branches or pull requests

4 participants