Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update templates after v0.5.8 llmforge release #391

Merged
merged 11 commits into from
Nov 20, 2024

Conversation

SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Nov 8, 2024

What does this PR do?

Updates workspace templates after v0.5.8 release of llmforge . Product release has happened already so this PR can be safely merged.

Some important changes in this version:

  • checkpoint_every_n_epochs is deprecated in favour of checkpoint_and_evaluation_frequency
  • max_num_checkpoints is deprecated.
  • awsv2 -> aws in RemoteStoragePath. This is because of awsv2 cli deprecation. Since RemoteStoragePath is a bleeding edge feature, this is hard deprecation.
  • TorchCompileConfig is here.

Also, we added liger support in the previous release - 0.5.7 but that was not added until now. This PR also adds liger to our configs.

FWIW, torch compile + liger has some subtleties around compatibility - the best configuration for perf is turning on all liger flags, so this PR only adds liger to the configs.

We've added liger only to the Lora configs since it's been hard to test out with A100s for full param.

SumanthRH and others added 6 commits November 8, 2024 13:38
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
@SumanthRH SumanthRH marked this pull request as ready for review November 13, 2024 23:47
@SumanthRH
Copy link
Member Author

Screenshots after testing Liger (from @erictang000 ) :

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha just wanna highlight that this is direct edit to the existing config.

I think having a separate config with liger enabled is also doable, but given that we've tested out liger extensively regarding correctness, I'm fine with having this be in the defaults to squeeze out more performance - A lot of optionality is also confusing to the user.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make liger versions a separate config. Mostly because there is an initialization time involved and it's not that faster either. So overall it's slower for 70B at least.

image

I also didn't notice where compile params are. Where are they?

rope: True
swiglu: True
cross_entropy: True
fused_linear_cross_entropy: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a comment on why flc is false or why rms norm is false.

@@ -1400,6 +1402,25 @@
"This plot illustrates that as we relax the cost constraints (i.e., increase the percentage of GPT-4 calls), the performance improves. While the performance of a random router improves linearly with cost, our router achieves significantly better results at each cost level."
]
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the router template to use the new 0.5.8 image. I noticed that the cell execution numbers are all messed up in the notebook, so I copied over some cleanup code from the E2E LLM Workflows template to cleanup cell nums and cached checkpoints.

Comment on lines 52 to 53
fused_linear_cross_entropy: False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erictang000 umm did this value change? why was this false again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed on slack, for lower context length + batch size, regular cross entropy can be faster and memory is similar. Example w/ this toggled for llama-3.2-1b. But easier to just use defaults w/ liger, so we can turn fused linear cross entropy on in our default configs.
image
image

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Copy link
Contributor

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
@SumanthRH SumanthRH merged commit b631ed8 into main Nov 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants