-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ported the repo to rel-2.3.0 #2
base: main
Are you sure you want to change the base?
Conversation
README.md
Outdated
@@ -55,21 +55,21 @@ To train a small 111M parameter model, run the following command. This will work | |||
(although CPU will be too slow to get that far). | |||
|
|||
```bash | |||
python train.py configs/111m.yaml | |||
python train.py CSX --mode train --params configs/111b.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made CSX the default device and updated the README.md
- /path/to/data/location | ||
python_paths: | ||
- /path/to/code/location | ||
trainer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.
causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min | ||
causal_attention_mask = create_broadcasted_autoregressive_mask( | ||
batch_size=batch_size, | ||
num_heads=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_heads=1, | |
num_heads=self.config.heads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating mask explicitly in #5 would be preferable.
save_checkpoint(step) | ||
|
||
logger.info("Training completed successfully!") | ||
from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could you clarify on what you mean by hiding the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.
If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it.
def main(): | ||
params = get_params_from_args() | ||
|
||
from cerebras.modelzoo.common.run_utils import main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras
In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.