Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update templates after v0.5.8 llmforge release #391

Merged
merged 11 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions templates/e2e-dspy-workflow/README.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -863,7 +863,7 @@
" <span style=\"color: #008000; text-decoration-color: #008000\">'name'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'dspy-llmforge-fine-tuning-job'</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'entrypoint'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml'</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'working_dir'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'.'</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'image_uri'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'localhost:5555/anyscale/llm-forge:0.5.7'</span>\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'image_uri'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'localhost:5555/anyscale/llm-forge:0.5.8'</span>\n",
"<span style=\"font-weight: bold\">}</span>\n",
"</pre>\n"
],
Expand All @@ -872,7 +872,7 @@
" \u001b[32m'name'\u001b[0m: \u001b[32m'dspy-llmforge-fine-tuning-job'\u001b[0m,\n",
" \u001b[32m'entrypoint'\u001b[0m: \u001b[32m'llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml'\u001b[0m,\n",
" \u001b[32m'working_dir'\u001b[0m: \u001b[32m'.'\u001b[0m,\n",
" \u001b[32m'image_uri'\u001b[0m: \u001b[32m'localhost:5555/anyscale/llm-forge:0.5.7'\u001b[0m\n",
" \u001b[32m'image_uri'\u001b[0m: \u001b[32m'localhost:5555/anyscale/llm-forge:0.5.8'\u001b[0m\n",
"\u001b[1m}\u001b[0m\n"
]
},
Expand Down
2 changes: 1 addition & 1 deletion templates/e2e-dspy-workflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -519,7 +519,7 @@ rich.print(yaml.safe_load(open(job_config_path)))
<span style="color: #008000; text-decoration-color: #008000">'name'</span>: <span style="color: #008000; text-decoration-color: #008000">'dspy-llmforge-fine-tuning-job'</span>,
<span style="color: #008000; text-decoration-color: #008000">'entrypoint'</span>: <span style="color: #008000; text-decoration-color: #008000">'llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml'</span>,
<span style="color: #008000; text-decoration-color: #008000">'working_dir'</span>: <span style="color: #008000; text-decoration-color: #008000">'.'</span>,
<span style="color: #008000; text-decoration-color: #008000">'image_uri'</span>: <span style="color: #008000; text-decoration-color: #008000">'localhost:5555/anyscale/llm-forge:0.5.7'</span>
<span style="color: #008000; text-decoration-color: #008000">'image_uri'</span>: <span style="color: #008000; text-decoration-color: #008000">'localhost:5555/anyscale/llm-forge:0.5.8'</span>
<span style="font-weight: bold">}</span>
</pre>

Expand Down
2 changes: 1 addition & 1 deletion templates/e2e-dspy-workflow/configs/job.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: "dspy-llmforge-fine-tuning-job"
entrypoint: "llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml"
working_dir: "."
image_uri: "localhost:5555/anyscale/llm-forge:0.5.7"
image_uri: "localhost:5555/anyscale/llm-forge:0.5.8"
2 changes: 1 addition & 1 deletion templates/e2e-llm-workflows/deploy/jobs/ft.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: e2e-llm-workflows
entrypoint: llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml
image_uri: localhost:5555/anyscale/llm-forge:0.5.7
image_uri: localhost:5555/anyscale/llm-forge:0.5.8
requirements: []
max_retries: 1
excludes: ["assets"]
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ num_checkpoints_to_keep: 1

# Deepspeed configuration, you can provide your own deepspeed setup
deepspeed:
config_path: deepspeed_configs/zero_3_hpz.json
config_path: deepspeed_configs/zero_3_offload_optim+param.json

# Accelerator type, we value of 0.001 is not important, as long as it is
# beteween 0 and 1. This ensures that accelerator type is used per trainer
Expand Down
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha just wanna highlight that this is direct edit to the existing config.

I think having a separate config with liger enabled is also doable, but given that we've tested out liger extensively regarding correctness, I'm fine with having this be in the defaults to squeeze out more performance - A lot of optionality is also confusing to the user.

Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,16 @@ deepspeed:
worker_resources:
accelerator_type:A100-80G: 0.001

# Liger kernel configuration
liger_kernel:
enabled: True
kwargs:
rms_norm: True
rope: True
swiglu: True
cross_entropy: True
fused_linear_cross_entropy: False

# Lora configuration
lora_config:
r: 8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ logger:
worker_resources:
anyscale/accelerator_shape:4xA10G: 0.001

# Liger kernel configuration
liger_kernel:
enabled: True
kwargs:
rms_norm: False
rope: True
swiglu: True
cross_entropy: True
fused_linear_cross_entropy: False

# Lora configuration
lora_config:
r: 8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,16 @@ logger:
worker_resources:
anyscale/accelerator_shape:4xA10G: 0.001

# Liger kernel configuration
liger_kernel:
enabled: True
kwargs:
rms_norm: False
rope: True
swiglu: True
cross_entropy: True
fused_linear_cross_entropy: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a comment on why flc is false or why rms norm is false.


# Lora configuration
lora_config:
r: 8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,16 @@ deepspeed:
worker_resources:
anyscale/accelerator_shape:4xA10G: 0.001

# Liger kernel configuration
liger_kernel:
enabled: True
kwargs:
rms_norm: True
rope: True
swiglu: True
cross_entropy: True
fused_linear_cross_entropy: False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erictang000 umm did this value change? why was this false again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed on slack, for lower context length + batch size, regular cross entropy can be faster and memory is similar. Example w/ this toggled for llama-3.2-1b. But easier to just use defaults w/ liger, so we can turn fused linear cross entropy on in our default configs.
image
image

# Lora configuration
lora_config:
r: 8
Expand Down
59 changes: 40 additions & 19 deletions templates/llm-router/README.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -306,7 +306,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -330,7 +330,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -461,7 +461,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -578,7 +578,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -644,7 +644,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -761,7 +761,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -904,7 +904,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -948,7 +948,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -988,7 +988,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1025,7 +1025,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1055,7 +1055,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1079,7 +1079,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand All @@ -1092,7 +1092,9 @@
"context_length: 1024\n",
"num_devices: 8\n",
"num_epochs: 5\n",
"checkpoint_every_n_epochs: 5\n",
"checkpoint_and_evaluation_frequency: \n",
" unit: epochs\n",
" frequency: 5\n",
"train_batch_size_per_device: 4\n",
"eval_batch_size_per_device: 4\n",
"lr_scheduler_type: constant\n",
Expand Down Expand Up @@ -1120,7 +1122,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1206,7 +1208,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1273,7 +1275,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1371,7 +1373,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1400,6 +1402,25 @@
"This plot illustrates that as we relax the cost constraints (i.e., increase the percentage of GPT-4 calls), the performance improves. While the performance of a random router improves linearly with cost, our router achieves significantly better results at each cost level."
]
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the router template to use the new 0.5.8 image. I noticed that the cell execution numbers are all messed up in the notebook, so I copied over some cleanup code from the E2E LLM Workflows template to cleanup cell nums and cached checkpoints.

"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleanup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cleanup\n",
"!python src/clear_cell_nums.py\n",
"!find . | grep -E \".ipynb_checkpoints\" | xargs rm -rf\n",
"!find . | grep -E \"(__pycache__|\\.pyc|\\.pyo)\" | xargs rm -rf"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -1425,7 +1446,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
14 changes: 13 additions & 1 deletion templates/llm-router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -715,7 +715,9 @@ For this tutorial, we will perform full-parameter finetuning of Llama3-8B on the
context_length: 1024
num_devices: 8
num_epochs: 5
checkpoint_every_n_epochs: 5
checkpoint_and_evaluation_frequency:
unit: epochs
frequency: 5
train_batch_size_per_device: 4
eval_batch_size_per_device: 4
lr_scheduler_type: constant
Expand Down Expand Up @@ -912,5 +914,15 @@ display(Image(filename=image_path))

This plot illustrates that as we relax the cost constraints (i.e., increase the percentage of GPT-4 calls), the performance improves. While the performance of a random router improves linearly with cost, our router achieves significantly better results at each cost level.

## Cleanup


```python
# Cleanup
!python src/clear_cell_nums.py
!find . | grep -E ".ipynb_checkpoints" | xargs rm -rf
!find . | grep -E "(__pycache__|\.pyc|\.pyo)" | xargs rm -rf
```

# Conclusion
In this tutorial, we have successfully built and evaluated a finetuned-LLM router. We generated synthetic labeled data using the LLM-as-a-judge method to train the model, finetuned an LLM classifier using Anyscale's API, and conducted offline evaluation on a standard benchmark-- demonstrating that our model is effective in out-of-domain generalization.
4 changes: 3 additions & 1 deletion templates/llm-router/configs/ft_config_a10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ valid_path: /mnt/user_storage/train_data_sample.jsonl
context_length: 1024
num_devices: 8
num_epochs: 5
checkpoint_every_n_epochs: 5
checkpoint_and_evaluation_frequency:
unit: epochs
frequency: 5
train_batch_size_per_device: 4
eval_batch_size_per_device: 4
lr_scheduler_type: constant
Expand Down
23 changes: 23 additions & 0 deletions templates/llm-router/src/clear_cell_nums.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from pathlib import Path

import nbformat


def clear_execution_numbers(nb_path):
with open(nb_path, "r", encoding="utf-8") as f:
nb = nbformat.read(f, as_version=4)
for cell in nb["cells"]:
if cell["cell_type"] == "code":
cell["execution_count"] = None
for output in cell["outputs"]:
if "execution_count" in output:
output["execution_count"] = None
with open(nb_path, "w", encoding="utf-8") as f:
nbformat.write(nb, f)


if __name__ == "__main__":
ROOT_DIR = Path(__file__).parent.parent
notebook_fps = list(ROOT_DIR.glob("**/*.ipynb"))
for fp in notebook_fps:
clear_execution_numbers(fp)
Loading