Skip to content

Conversation

spmex
Copy link

@spmex spmex commented Jun 12, 2025

Summary:
Add a new unit test in test_model_parallel_nccl_ssd_single_gpu.py for SSD TBE with VBE input.

Context

  • This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output.
  • For SSD TBE, the tensor wrapped in a shard is a PartiallyMaterializedTensor (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically:
    • It misses certain methods like ndim.
    • copy_ method is a no-op. Writing should be done through the wrapped C++ object of PMT.
    • Only row wise, table wise and table row wise sharding types are supported.

NOTE: The new test only works in FP32 for now. SSD TBE only support RowWiseAdagrad optimizer with eps = 1e-8. However, this will be rounded to zero at FP16, leading to numerical instabilities of the unsharded model adopting the same optimizer.

Differential Revision: D76455104

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 12, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76455104

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76455104

spmex pushed a commit to spmex/torchrec that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: meta-pytorch#3086

Add a new unit test in [`test_model_parallel_nccl_ssd_single_gpu.py`](https://www.internalfb.com/code/fbsource/[5f477259031a]/fbcode/torchrec/distributed/tests/test_model_parallel_nccl_ssd_single_gpu.py) for SSD TBE with VBE input.

### Context
* This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output.
* For SSD TBE, the tensor wrapped in a shard is a [`PartiallyMaterializedTensor`](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py) (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically:
  - It misses certain methods like `ndim`.
  - `copy_` method is a no-op. Writing should be done through the [wrapped C++ object](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp?lines=497) of PMT.
  - Only `ROW_WISE`, `TABLE_WISE` and `TABLE_ROW_WISE` sharding types are supported.

NOTE: SSD TBE only support `RowWiseAdagrad` optimizer. For **FP16**, The learning rate and eps need to be carefully selected for avoid numerical instabilities for the unsharded model. Here we use `lr = 0.001` and `eps = 0.001` to pass the test.

Differential Revision: D76455104
@spmex spmex force-pushed the export-D76455104 branch from 126d745 to 1369c0b Compare June 12, 2025 18:32
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76455104

spmex pushed a commit to spmex/torchrec that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: meta-pytorch#3086

Add a new unit test in [`test_model_parallel_nccl_ssd_single_gpu.py`](https://www.internalfb.com/code/fbsource/[5f477259031a]/fbcode/torchrec/distributed/tests/test_model_parallel_nccl_ssd_single_gpu.py) for SSD TBE with VBE input.

### Context
* This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output.
* For SSD TBE, the tensor wrapped in a shard is a [`PartiallyMaterializedTensor`](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py) (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically:
  - It misses certain methods like `ndim`.
  - `copy_` method is a no-op. Writing should be done through the [wrapped C++ object](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp?lines=417) of PMT.
  - Only `ROW_WISE`, `TABLE_WISE` and `TABLE_ROW_WISE` sharding types are supported.

NOTE: SSD TBE only support `RowWiseAdagrad` optimizer. For **FP16**, The learning rate and eps need to be carefully selected for avoid numerical instabilities for the unsharded model. Here we use `lr = 0.001` and `eps = 0.001` to pass the test.

Reviewed By: TroyGarden

Differential Revision: D76455104
@spmex spmex force-pushed the export-D76455104 branch from 1369c0b to 27e79f9 Compare June 13, 2025 00:58
Summary:
Pull Request resolved: meta-pytorch#3086

Add a new unit test in [`test_model_parallel_nccl_ssd_single_gpu.py`](https://www.internalfb.com/code/fbsource/[5f477259031a]/fbcode/torchrec/distributed/tests/test_model_parallel_nccl_ssd_single_gpu.py) for SSD TBE with VBE input.

### Context
* This test is a prerequisite to test out the incoming FBGEMM & TorchRec changes to merge VBE output.
* For SSD TBE, the tensor wrapped in a shard is a [`PartiallyMaterializedTensor`](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py) (PMT) which requires special handling when copying state dict from an unsharded tensor. Specifically:
  - It misses certain methods like `ndim`.
  - `copy_` method is a no-op. Writing should be done through the [wrapped C++ object](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings.cpp?lines=417) of PMT.
  - Only `ROW_WISE`, `TABLE_WISE` and `TABLE_ROW_WISE` sharding types are supported.

NOTE: SSD TBE only support `RowWiseAdagrad` optimizer. For **FP16**, The learning rate and eps need to be carefully selected for avoid numerical instabilities for the unsharded model. Here we use `lr = 0.001` and `eps = 0.001` to pass the test.

Reviewed By: TroyGarden

Differential Revision: D76455104
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76455104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants