Add unit test for SSD TBE with VBE input #3086

spmex · 2025-06-12T16:51:19Z

Summary:
Add a new unit test in test_model_parallel_nccl_ssd_single_gpu.py for SSD TBE with VBE input.

Context

This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output.
For SSD TBE, the tensor wrapped in a shard is a PartiallyMaterializedTensor (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically:
- It misses certain methods like ndim.
- copy_ method is a no-op. Writing should be done through the wrapped C++ object of PMT.
- Only row wise, table wise and table row wise sharding types are supported.

NOTE: The new test only works in FP32 for now. SSD TBE only support RowWiseAdagrad optimizer with eps = 1e-8. However, this will be rounded to zero at FP16, leading to numerical instabilities of the unsharded model adopting the same optimizer.

Differential Revision: D76455104

facebook-github-bot · 2025-06-12T16:51:44Z

This pull request was exported from Phabricator. Differential Revision: D76455104

facebook-github-bot · 2025-06-12T18:32:53Z

This pull request was exported from Phabricator. Differential Revision: D76455104

Summary: Pull Request resolved: meta-pytorch#3086 Add a new unit test in [`test_model_parallel_nccl_ssd_single_gpu.py`](https://www.internalfb.com/code/fbsource/[5f477259031a]/fbcode/torchrec/distributed/tests/test_model_parallel_nccl_ssd_single_gpu.py) for SSD TBE with VBE input. ### Context * This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output. * For SSD TBE, the tensor wrapped in a shard is a [`PartiallyMaterializedTensor`](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py) (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically: - It misses certain methods like `ndim`. - `copy_` method is a no-op. Writing should be done through the [wrapped C++ object](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp?lines=497) of PMT. - Only `ROW_WISE`, `TABLE_WISE` and `TABLE_ROW_WISE` sharding types are supported. NOTE: SSD TBE only support `RowWiseAdagrad` optimizer. For **FP16**, The learning rate and eps need to be carefully selected for avoid numerical instabilities for the unsharded model. Here we use `lr = 0.001` and `eps = 0.001` to pass the test. Differential Revision: D76455104

facebook-github-bot · 2025-06-13T00:58:47Z

This pull request was exported from Phabricator. Differential Revision: D76455104

Summary: Pull Request resolved: meta-pytorch#3086 Add a new unit test in [`test_model_parallel_nccl_ssd_single_gpu.py`](https://www.internalfb.com/code/fbsource/[5f477259031a]/fbcode/torchrec/distributed/tests/test_model_parallel_nccl_ssd_single_gpu.py) for SSD TBE with VBE input. ### Context * This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output. * For SSD TBE, the tensor wrapped in a shard is a [`PartiallyMaterializedTensor`](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py) (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically: - It misses certain methods like `ndim`. - `copy_` method is a no-op. Writing should be done through the [wrapped C++ object](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp?lines=417) of PMT. - Only `ROW_WISE`, `TABLE_WISE` and `TABLE_ROW_WISE` sharding types are supported. NOTE: SSD TBE only support `RowWiseAdagrad` optimizer. For **FP16**, The learning rate and eps need to be carefully selected for avoid numerical instabilities for the unsharded model. Here we use `lr = 0.001` and `eps = 0.001` to pass the test. Reviewed By: TroyGarden Differential Revision: D76455104

facebook-github-bot · 2025-06-13T06:39:26Z

This pull request was exported from Phabricator. Differential Revision: D76455104

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 12, 2025

facebook-github-bot added the fb-exported label Jun 12, 2025

spmex force-pushed the export-D76455104 branch from 126d745 to 1369c0b Compare June 12, 2025 18:32

spmex force-pushed the export-D76455104 branch from 1369c0b to 27e79f9 Compare June 13, 2025 00:58

spmex force-pushed the export-D76455104 branch from 27e79f9 to dd628b5 Compare June 13, 2025 06:39

facebook-github-bot closed this in fbfbcff Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unit test for SSD TBE with VBE input #3086

Add unit test for SSD TBE with VBE input #3086

Uh oh!

spmex commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add unit test for SSD TBE with VBE input #3086

Add unit test for SSD TBE with VBE input #3086

Uh oh!

Conversation

spmex commented Jun 12, 2025

Context

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants