Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ported the repo to rel-2.3.0 #2

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

srinjoym-cerebras
Copy link

In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.

README.md Outdated
@@ -55,21 +55,21 @@ To train a small 111M parameter model, run the following command. This will work
(although CPU will be too slow to get that far).

```bash
python train.py configs/111m.yaml
python train.py CSX --mode train --params configs/111b.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made CSX the default device and updated the README.md

- /path/to/data/location
python_paths:
- /path/to/code/location
trainer:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.

causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min
causal_attention_mask = create_broadcasted_autoregressive_mask(
batch_size=batch_size,
num_heads=1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_heads=1,
num_heads=self.config.heads,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating mask explicitly in #5 would be preferable.

save_checkpoint(step)

logger.info("Training completed successfully!")
from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you clarify on what you mean by hiding the code?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.

If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it.

def main():
params = get_params_from_args()

from cerebras.modelzoo.common.run_utils import main
Copy link

@mohitk-cerebras mohitk-cerebras Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants