[WIP] Support DiLoCo training #1018

jonb377 · 2024-11-07T23:41:15Z

DiLoCo training paper: https://arxiv.org/pdf/2311.08105

Adds preliminary support for DiLoCo training through DrJax.

zcharles8 · 2024-11-08T16:28:49Z

MaxText/diloco.py

+  """
+  import drjax
+
+  # TODO(jonbolin): Keep this as part of DiLoCoTrainState?


This is a good question - some configurability here is likely nice. I think it's possible that people might want to experiment with other outer optimizers, though sgd + nesterov momentum is a really competitive baseline.

zcharles8 · 2024-11-08T16:30:49Z

MaxText/diloco.py

+    # to vmap over.
+    per_replica_batch = config.global_batch_size_to_train_on // config.num_diloco_replicas
+    batch_shape = (config.num_diloco_replicas, per_replica_batch, -1)
+    batch = jax.tree.map(lambda x: x.reshape(batch_shape), batch)


In some versions I've written, I explicitly add this extra axis before passing to the train step, so that I can ensure that it is sharded over the diloco axis. I'm not certain if that's necessary though, but wanted to flag it for posterity.

zcharles8 · 2024-11-08T16:32:06Z

MaxText/diloco.py

+      train_step,
+      (state.inner_state, batch, broadcast_rng)
+    )
+    avg_metrics = typed_reduce_mean(metrics)


One thing to call out is that differentiating between metrics that need to be summed and metrics that need to be averaged is kind of a pain.

E.g. if you're keeping track of total number of tokens seen, then a reduce_sum here is probably desired. I don't have any great solutions though without understanding better the structure of metrics.

Support DiLoCo training

96ee9f0

zcharles8 reviewed Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support DiLoCo training #1018

[WIP] Support DiLoCo training #1018

jonb377 commented Nov 7, 2024

zcharles8 Nov 8, 2024

zcharles8 Nov 8, 2024

zcharles8 Nov 8, 2024

[WIP] Support DiLoCo training #1018

Are you sure you want to change the base?

[WIP] Support DiLoCo training #1018

Conversation

jonb377 commented Nov 7, 2024

zcharles8 Nov 8, 2024

Choose a reason for hiding this comment

zcharles8 Nov 8, 2024

Choose a reason for hiding this comment

zcharles8 Nov 8, 2024

Choose a reason for hiding this comment