Introduce custom external data loader #21634

fs-eire · 2024-08-06T08:54:32Z

Description

This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU.

Motivation and Context

In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (sizeof(size_t)==4), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models.
ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them.
Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly.

Design

Class ExternalDataLoader and ExternalDataLoaderManager are introduced. They are similar to DataTransfer and DataTransferManager. InferenceSession owns the manager object, and SessionState keeps a reference to it.

Added a new method GetExternalDataLoader in IExecutionProvider. An EP can override the method to register an instance of custom external data loader.

The key function in a ExternalDataLoader class is method LoadTensor:

  // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan).
  virtual common::Status LoadTensor(const Env& env,
                                    const std::filesystem::path& data_file_path,
                                    FileOffsetType data_offset,
                                    SafeInt<size_t> data_length,
                                    Tensor& tensor) const;

This function can be registered by EP, going through a few layers and eventually get into DeserializeTensorProto() in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.

onnxruntime/core/framework/external_data_loader_manager.cc

onnxruntime/core/framework/external_data_loader_manager.h

include/onnxruntime/core/framework/execution_provider.h

onnxruntime/core/providers/js/external_data_loader.cc

onnxruntime/core/framework/session_state_utils.cc

guschmue

have been using/testing this a lot with large models.
Works great, no issues.

github-advanced-security bot found potential problems Aug 6, 2024

View reviewed changes

onnxruntime/core/framework/external_data_loader_manager.cc Fixed Show fixed Hide fixed

Introduce custom external data loader

ca5c3d6

fs-eire force-pushed the custom-ext-data-loader branch from 62c569e to ca5c3d6 Compare August 6, 2024 10:58

pranavsharma reviewed Aug 6, 2024

View reviewed changes

onnxruntime/core/framework/external_data_loader_manager.h Outdated Show resolved Hide resolved

include/onnxruntime/core/framework/execution_provider.h Show resolved Hide resolved

impl

1d22809

github-advanced-security bot found potential problems Aug 7, 2024

View reviewed changes

onnxruntime/core/providers/js/external_data_loader.cc Fixed Show fixed Hide fixed

fs-eire added 5 commits August 7, 2024 18:51

Merge remote-tracking branch 'origin/main' into custom-ext-data-loader

8ce06cb

resolve comments and fix linter

9368cb6

support non-webgpu wasm build

4e831ae

fix build and rearrange default loader

523a06f

Merge remote-tracking branch 'origin/main' into custom-ext-data-loader

e5aa6ee

github-advanced-security bot found potential problems Aug 16, 2024

View reviewed changes

onnxruntime/core/framework/session_state_utils.cc Fixed Show fixed Hide fixed

fs-eire added 3 commits August 16, 2024 00:42

fix

4f5c1d0

fix linter

628ebf7

Merge remote-tracking branch 'origin/main' into custom-ext-data-loader

841e64c

guschmue approved these changes Aug 27, 2024

View reviewed changes

pranavsharma approved these changes Aug 27, 2024

View reviewed changes

pranavsharma merged commit d2a1b7a into main Aug 27, 2024
90 of 96 checks passed

pranavsharma deleted the custom-ext-data-loader branch August 27, 2024 19:18

bbernhar mentioned this pull request Sep 12, 2024

Support building graphs from MLTensor containing constants webmachinelearning/webnn#760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce custom external data loader #21634

Introduce custom external data loader #21634

fs-eire commented Aug 6, 2024 •

edited

Loading

guschmue left a comment

Introduce custom external data loader #21634

Introduce custom external data loader #21634

Conversation

fs-eire commented Aug 6, 2024 • edited Loading

Description

Motivation and Context

Design

guschmue left a comment

Choose a reason for hiding this comment

fs-eire commented Aug 6, 2024 •

edited

Loading