Ported the repo to rel-2.3.0 #2

srinjoym-cerebras · 2024-08-02T14:28:02Z

In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.

gokulr-cerebras · 2024-08-05T06:30:53Z

README.md

@@ -55,21 +55,21 @@ To train a small 111M parameter model, run the following command. This will work
 (although CPU will be too slow to get that far).

 ```bash
-python train.py configs/111m.yaml
+python train.py CSX --mode train --params configs/111b.yaml


Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device

Made CSX the default device and updated the README.md

gokulr-cerebras · 2024-08-05T06:32:32Z

configs/111m.yaml

-  - /path/to/data/location
-python_paths:
-  - /path/to/code/location
+trainer:


Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.

deepak-cb · 2024-09-03T17:22:47Z

model.py

-        causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min
+        causal_attention_mask = create_broadcasted_autoregressive_mask(
+            batch_size=batch_size,
+            num_heads=1,


Suggested change

num_heads=1,

num_heads=self.config.heads,

Creating mask explicitly in #5 would be preferable.

deepak-cb · 2024-09-03T17:25:05Z

train.py

-        save_checkpoint(step)
-
-    logger.info("Training completed successfully!")
+from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args


I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.

Sorry, could you clarify on what you mean by hiding the code?

Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.

If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main.

ok, got it.

mohitk-cerebras · 2024-09-03T17:34:05Z

train.py

+def main():
+    params = get_params_from_args()
+
+    from cerebras.modelzoo.common.run_utils import main


do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras

Srinjoy Mukherjee added 3 commits August 1, 2024 19:27

Added changes for porting to rel-2.3.0

0815911

Cleant the YAML files

ed7c0c6

Modified ReadMe.md

00496b8

gokulr-cerebras reviewed Aug 5, 2024

View reviewed changes

Made CSX the default device

ca803d3

deepak-cb reviewed Sep 3, 2024

View reviewed changes

mohitk-cerebras reviewed Sep 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ported the repo to rel-2.3.0 #2

Ported the repo to rel-2.3.0 #2

srinjoym-cerebras commented Aug 2, 2024

gokulr-cerebras Aug 5, 2024

srinjoym-cerebras Aug 5, 2024

gokulr-cerebras Aug 5, 2024

deepak-cb Sep 3, 2024

deepak-cb Sep 4, 2024

deepak-cb Sep 3, 2024

srinjoym-cerebras Sep 5, 2024

deepak-cb Sep 11, 2024

srinjoym-cerebras Sep 12, 2024

mohitk-cerebras Sep 3, 2024 •

edited

Loading

Ported the repo to rel-2.3.0 #2

Are you sure you want to change the base?

Ported the repo to rel-2.3.0 #2

Conversation

srinjoym-cerebras commented Aug 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohitk-cerebras Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

mohitk-cerebras Sep 3, 2024 •

edited

Loading