[Nodes] Add a ToDevice node, or combine with pin memory #1407

andrewkho · 2024-12-13T20:06:10Z

🚀 The feature

We should add a node that will send batches to device (probably one at a time). We could either separate this, add it on to pre-fetcher (ie always call .to(device) on the head of the queue, or maybe part of pin-memory

Motivation, pitch

Sending data to device can be slow, and often users want this done in a background thread. DataLoader should do this in the backgroudn as it consolidates state management

Alternatives

No response

Additional context

No response

divyanshk · 2024-12-13T20:19:14Z

I wonder, how different can this be from doing the transfer within a Mapper, similar to a collate_fn doing tensor.to(device)

divyanshk · 2024-12-26T18:35:13Z

For cases where we have multiple threads reading from data, we might be able to create multiple thread local CUDA streams to transfer data onto the GPU. WDYT @andrewkho ?

andrewkho added the nodes label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nodes] Add a ToDevice node, or combine with pin memory #1407

[Nodes] Add a ToDevice node, or combine with pin memory #1407

andrewkho commented Dec 13, 2024

divyanshk commented Dec 13, 2024

divyanshk commented Dec 26, 2024

[Nodes] Add a ToDevice node, or combine with pin memory #1407

[Nodes] Add a ToDevice node, or combine with pin memory #1407

Comments

andrewkho commented Dec 13, 2024

🚀 The feature

Motivation, pitch

Alternatives

Additional context

divyanshk commented Dec 13, 2024

divyanshk commented Dec 26, 2024