-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update templates after v0.5.8 llmforge
release
#391
Conversation
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Screenshots after testing Liger (from @erictang000 ) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kouroshHakha just wanna highlight that this is direct edit to the existing config.
I think having a separate config with liger enabled is also doable, but given that we've tested out liger extensively regarding correctness, I'm fine with having this be in the defaults to squeeze out more performance - A lot of optionality is also confusing to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rope: True | ||
swiglu: True | ||
cross_entropy: True | ||
fused_linear_cross_entropy: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a comment on why flc is false or why rms norm is false.
@@ -1400,6 +1402,25 @@ | |||
"This plot illustrates that as we relax the cost constraints (i.e., increase the percentage of GPT-4 calls), the performance improves. While the performance of a random router improves linearly with cost, our router achieves significantly better results at each cost level." | |||
] | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the router template to use the new 0.5.8 image. I noticed that the cell execution numbers are all messed up in the notebook, so I copied over some cleanup code from the E2E LLM Workflows template to cleanup cell nums and cached checkpoints.
fused_linear_cross_entropy: False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erictang000 umm did this value change? why was this false again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…m/anyscale/templates into sumanthrh/update-templates-v0.5.8
What does this PR do?
Updates workspace templates after v0.5.8 release of
llmforge
. Product release has happened already so this PR can be safely merged.Some important changes in this version:
checkpoint_every_n_epochs
is deprecated in favour ofcheckpoint_and_evaluation_frequency
max_num_checkpoints
is deprecated.awsv2
->aws
inRemoteStoragePath
. This is because of awsv2 cli deprecation. Since RemoteStoragePath is a bleeding edge feature, this is hard deprecation.TorchCompileConfig
is here.Also, we added liger support in the previous release - 0.5.7 but that was not added until now. This PR also adds liger to our configs.
FWIW, torch compile + liger has some subtleties around compatibility - the best configuration for perf is turning on all liger flags, so this PR only adds liger to the configs.
We've added liger only to the Lora configs since it's been hard to test out with A100s for full param.