-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce custom external data loader #21634
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62c569e
to
ca5c3d6
Compare
pranavsharma
reviewed
Aug 6, 2024
guschmue
approved these changes
Aug 27, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have been using/testing this a lot with large models.
Works great, no issues.
pranavsharma
approved these changes
Aug 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU.
Motivation and Context
In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (
sizeof(size_t)==4
), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models.ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them.
Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly.
Design
Class
ExternalDataLoader
andExternalDataLoaderManager
are introduced. They are similar toDataTransfer
andDataTransferManager
.InferenceSession
owns the manager object, andSessionState
keeps a reference to it.Added a new method
GetExternalDataLoader
inIExecutionProvider
. An EP can override the method to register an instance of custom external data loader.The key function in a
ExternalDataLoader
class is methodLoadTensor
:This function can be registered by EP, going through a few layers and eventually get into
DeserializeTensorProto()
in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.