Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce custom external data loader #21634

Merged
merged 10 commits into from
Aug 27, 2024
Merged

Conversation

fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Aug 6, 2024

Description

This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU.

Motivation and Context

  • In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (sizeof(size_t)==4), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models.

  • ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them.

  • Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly.

Design

Class ExternalDataLoader and ExternalDataLoaderManager are introduced. They are similar to DataTransfer and DataTransferManager. InferenceSession owns the manager object, and SessionState keeps a reference to it.

Added a new method GetExternalDataLoader in IExecutionProvider. An EP can override the method to register an instance of custom external data loader.

The key function in a ExternalDataLoader class is method LoadTensor:

  // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan).
  virtual common::Status LoadTensor(const Env& env,
                                    const std::filesystem::path& data_file_path,
                                    FileOffsetType data_offset,
                                    SafeInt<size_t> data_length,
                                    Tensor& tensor) const;

This function can be registered by EP, going through a few layers and eventually get into DeserializeTensorProto() in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.

@fs-eire fs-eire force-pushed the custom-ext-data-loader branch from 62c569e to ca5c3d6 Compare August 6, 2024 10:58
Copy link
Contributor

@guschmue guschmue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have been using/testing this a lot with large models.
Works great, no issues.

@pranavsharma pranavsharma merged commit d2a1b7a into main Aug 27, 2024
90 of 96 checks passed
@pranavsharma pranavsharma deleted the custom-ext-data-loader branch August 27, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants