Parallelize device-to-host transfers #5824

jonb377 · 2023-11-19T23:52:52Z

For async checkpointing, it is important to unblock training as quickly as possible. Training can only be unblocked when all data has been moved off of device onto the host to free up device memory.

One bottleneck I found was that in TransferFromServer, the tensors are transferred using ToLiteralSync, meaning each tensor is transferred sequentially. In benchmarking a 2B parameter model, parallelizing these transfers decreased the time spent in TransferFromServer from 5.1s to 1.8s, ~65% reduction.

There is still significant overhead from copying the resulting xla::Literal into torch.Tensor, but that's for another PR.

will-cromar

Thanks!

yeounoh

LGTM

jonb377 requested a review from will-cromar November 19, 2023 23:52

jonb377 self-assigned this Nov 19, 2023

This was referenced Nov 20, 2023

Distribute Literal->Tensor copies across thread pool #5825

Merged

Vectorize local shard retrieval #5826

Merged

Parallelize d2h transfers

c6bac7f

jonb377 force-pushed the jonbolin/d2h branch from e8693ce to c6bac7f Compare November 20, 2023 11:49

will-cromar approved these changes Nov 20, 2023

View reviewed changes

yeounoh approved these changes Nov 23, 2023

View reviewed changes

jonb377 merged commit b9475d9 into master Nov 29, 2023

jonb377 deleted the jonbolin/d2h branch November 29, 2023 21:40

ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this pull request Dec 1, 2023

Parallelize d2h transfers (pytorch#5824)

ff098f6

ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this pull request Dec 1, 2023

Parallelize d2h transfers (pytorch#5824)

24d108f

chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023

Parallelize d2h transfers (pytorch#5824)

28b9d79

golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024

Parallelize d2h transfers (#5824)

8fc682d

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Parallelize d2h transfers (#5824)

d09f33a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize device-to-host transfers #5824

Parallelize device-to-host transfers #5824

jonb377 commented Nov 19, 2023 •

edited

Loading

will-cromar left a comment

yeounoh left a comment

Parallelize device-to-host transfers #5824

Parallelize device-to-host transfers #5824

Conversation

jonb377 commented Nov 19, 2023 • edited Loading

will-cromar left a comment

Choose a reason for hiding this comment

yeounoh left a comment

Choose a reason for hiding this comment

jonb377 commented Nov 19, 2023 •

edited

Loading