Log weight hashes for DSv3 w/ pp vs w/o pp #240

xmfan · 2025-11-08T04:35:31Z

Stacked PRs:

->Log weight hashes for DSv3 w/ pp vs w/o pp #240

Intended usage:

> torchrun --nproc-per-node=8 examples/example_ds3_pp.py --rng-seed=42; torchrun --nproc-per-node=4 examples/example_ds3_local_map.py --rng-seed=42

> diff out/0/pp_weights.log  out/1/weights.log 
--- out/0/pp_weights.log        2025-11-07 20:31:34.447960867 -0800
+++ out/1/weights.log   2025-11-07 20:32:52.499859593 -0800
@@ -60,12 +60,9 @@
 name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.0.moe.expert_bias' hash=DTensor(0)
 name='layers.0.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.1.moe.expert_bias' hash=DTensor(0)
 name='layers.1.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.2.moe.expert_bias' hash=DTensor(0)
 name='layers.2.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.3.moe.expert_bias' hash=DTensor(0)
 name='layers.3.moe.tokens_per_expert' hash=DTensor(0)

Current difference is due to model implementation, where the pp stages each have freqs_cis, but for the non-pp version there's only 1 freqs_cis buffer on the root model class

Remove the per_op logging since numerics aren't diff friendly yet.

Log weight hashes for DSv3 w/ pp vs w/o pp

stack-info: PR: #240, branch: xmfan/stack/18

Log weight hashes for DSv3 w/ pp vs w/o pp

655c293

stack-info: PR: #240, branch: xmfan/stack/18

xmfan force-pushed the xmfan/stack/18 branch from 064fe2a to 655c293 Compare November 8, 2025 04:35

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log weight hashes for DSv3 w/ pp vs w/o pp #240

Log weight hashes for DSv3 w/ pp vs w/o pp #240

xmfan commented Nov 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Log weight hashes for DSv3 w/ pp vs w/o pp #240

Are you sure you want to change the base?

Log weight hashes for DSv3 w/ pp vs w/o pp #240

Conversation

xmfan commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xmfan commented Nov 8, 2025 •

edited

Loading