-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC][TRT] Allocate GPU data buffers and transfer data when needed #6872
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few comments.
@@ -106,9 +104,11 @@ class TensorRTRuntime : public JSONRuntimeBase { | |||
#ifdef TVM_GRAPH_RUNTIME_TENSORRT | |||
/*! \brief Run inference using built engine. */ | |||
void Run() override { | |||
BuildEngine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason of moving BuildEngine
from Init
to Run
because you need subgraph specific information (e.g., I/O data entry IDs) to allocate device buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comaniac for the review! Yes, to allocate the device buffers we need the DLTensor context and shape. data_entry_
in JSON runtime isn't initialized until Run()
so I had to move BuildEngine.
In the future, we are planning to be able to dynamically build engines for different input shapes in order to handle subgraphs with dynamic input sizes, so moving it would be needed for that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…pache#6872) * Allocate data buffers for gpu fix * Rename AllocateDeviceBuffer, update docstrings * Remove unneeded cast
…pache#6872) * Allocate data buffers for gpu fix * Rename AllocateDeviceBuffer, update docstrings * Remove unneeded cast
…pache#6872) * Allocate data buffers for gpu fix * Rename AllocateDeviceBuffer, update docstrings * Remove unneeded cast
This PR enables the TRT BYOC integration to be used with target="llvm" (previously could only use "cuda").
If an input or output DLTensor is not located on the GPU, we will now allocate a GPU buffer to pass to TensorRT and transfer the data from the DLTensor accordingly. Since data_entry_ is needed during BuildEngine now, we had to move BuildEngine from JsonRuntime::Init to first run.
This is prerequisite to use TRT BYOC in combination with Relay VM which in general requires llvm target.
Thanks @ylc for original implementation: neo-ai#147