Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error handling for GPU tensors #249

Merged
merged 8 commits into from
Jun 6, 2023
Merged

Fix error handling for GPU tensors #249

merged 8 commits into from
Jun 6, 2023

Conversation

Tabrizian
Copy link
Member

@Tabrizian Tabrizian commented May 25, 2023

Bug

the problem was that the inference_response->Send method did not properly handle the error case. GUARDED_RESPOND_IF_ERROR after Send would cause a double free because of sending the response twice.

Another issue existed in the GPU tensor error handling where the error was not properly catched and returned to the stub process.

Fix

For the first issue the fix was to simplify the error handling and only send an error using the inference response.

For the second issue, new data structures were added to properly catch the error.

triton-inference-server/server#5871

@Tabrizian Tabrizian force-pushed the imant-fix-gpu-bug branch 2 times, most recently from 00df006 to b4946bd Compare May 29, 2023 22:38
@Tabrizian Tabrizian marked this pull request as ready for review May 29, 2023 22:40
@rmccorm4
Copy link
Contributor

Can you summarize the bug and fix in the description?

@Tabrizian
Copy link
Member Author

@rmccorm4 added.

@Tabrizian Tabrizian force-pushed the imant-fix-gpu-bug branch from b4946bd to 4e25872 Compare May 30, 2023 13:57
src/pb_stub.cc Outdated Show resolved Hide resolved
src/gpu_buffers.cc Outdated Show resolved Hide resolved
src/infer_request.cc Show resolved Hide resolved
src/python_be.cc Show resolved Hide resolved
src/response_sender.cc Outdated Show resolved Hide resolved
@Tabrizian Tabrizian requested review from rmccorm4 and krishung5 June 1, 2023 14:28
@Tabrizian Tabrizian force-pushed the imant-fix-gpu-bug branch from f68d168 to fd3df73 Compare June 1, 2023 14:56
@Tabrizian Tabrizian force-pushed the imant-fix-gpu-bug branch from b3c3d9f to 73e902f Compare June 1, 2023 19:09
uint32_t buffer_count;
};

class GPUBufferTransporter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the term Transporter common - not familiar - maybe a short one or two line class description would help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

void
GPUBufferTransporter::Complete(std::unique_ptr<SharedMemoryManager>& shm_pool)
{
if (completed_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error case as above with adding a buffer to a completed transaction?

src/pb_stub.cc Show resolved Hide resolved
src/pb_utils.h Outdated
@@ -212,23 +212,17 @@ struct ResponseSenderBase {
struct ResponseSendMessage : ResponseSenderBase {
bi::managed_external_buffer::handle_t response;

// GPU Buffers handle
// A pointer to GPUBuffersShm object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we could keep this the same if the structure was renamed to GPUBuffers - is it still a handle or is that seperate from pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By pointer I meant a handle to a GPUBuffersShm. Will update the comment.

src/python_be.cc Outdated
@@ -661,46 +660,33 @@ ModelInstanceState::ExecuteBLSRequest(
lbackend_memory.reset(backend_memory);
input_tensor->SetMemory(std::move(PbMemory::Create(
Stub()->ShmPool(), std::move(lbackend_memory))));
gpu_buffer_transporter.AddBuffer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of transporter would response work? GPUBuffers and GPUBuffersResponse instead of GPUBuffersShm and GPUBufferTransporter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like transporter more as I think "response" may imply that it can only be used with the backend responses which is not true. Let me know if you have any other suggestions.

src/python_be.cc Outdated Show resolved Hide resolved
@nnshah1
Copy link
Contributor

nnshah1 commented Jun 2, 2023

I don't quite follow the logic of when we use the GPUBufferTransporter and when we use the GPUBuffersShm structure directly - but I think that's from my lack of familiarity with the code.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor comment

@Tabrizian Tabrizian force-pushed the imant-fix-gpu-bug branch from e661853 to 3779acb Compare June 5, 2023 17:41
nnshah1
nnshah1 previously approved these changes Jun 5, 2023
krishung5
krishung5 previously approved these changes Jun 5, 2023
Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Tabrizian Tabrizian dismissed stale reviews from krishung5 and nnshah1 via e835269 June 5, 2023 21:35
@Tabrizian Tabrizian merged commit 0a54e59 into main Jun 6, 2023
@Tabrizian Tabrizian deleted the imant-fix-gpu-bug branch August 10, 2023 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants