Skip to content

Improve the way we allocate and use memory for GPU batch decoding #189

Closed
@ahmadsharif1

Description

@ahmadsharif1

Currently when we are doing batch decoding, using get frames_at_indexes, we allocate the memory on the host, and then we all allocate each frame independently and then copy the memory into the batch memory.

Memory is allocated here for the batch:

BatchDecodedOutput output(frameIndexes.size(), options, streamMetadata);

The copy is done here from frame to batch memory:

output.frames[i++] = frame;

This is wasteful especially when we are doing GPU recording because we incur multiple device to host transfers -- one per frame.

Action items:

  1. We should respect the device when we are allocating the batch tensor memory.
  2. We should directly use the batch sensor memory instead of incurring an extra memcpy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions