Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Various polishing. #4002

Merged
merged 3 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,27 @@

----
:fire: *News* :fire:
- [Sep, 2024] Point, Lanuch and Serve **Llama 3.2** on on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
- [Sep, 2024] Run and deploy [Pixtral](./llm/pixtral), the first open-source multimodal model from Mistral AI.
- [Jul, 2024] [Finetune](./llm/llama-3_1-finetuning/) and [serve](./llm/llama-3_1/) **Llama 3.1** on your infra
- [Sep, 2024] Point, Launch and Serve **Llama 3.2** on on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
- [Sep, 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
- [Jul, 2024] [**Finetune**](./llm/llama-3_1-finetuning/) and [**serve**](./llm/llama-3_1/) **Llama 3.1** on your infra
- [Jun, 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
- [Apr, 2024] Serve [**Qwen-110B**](https://qwenlm.github.io/blog/qwen1.5-110b/) on your infra: [**example**](./llm/qwen/)
- [Apr, 2024] Using [**Ollama**](https://github.com/ollama/ollama) to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
- [Feb, 2024] Deploying and scaling [**Gemma**](https://blog.google/technology/developers/gemma-open-models/) with SkyServe: [**example**](./llm/gemma/)
- [Feb, 2024] Serving [**Code Llama 70B**](https://ai.meta.com/blog/code-llama-large-language-model-coding/) with vLLM and SkyServe: [**example**](./llm/codellama/)
- [Dec, 2023] [**Mixtral 8x7B**](https://mistral.ai/news/mixtral-of-experts/), a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
- [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
- [Apr, 2024] Serve **Qwen-110B** on your infra: [**example**](./llm/qwen/)
- [Apr, 2024] Using **Ollama** to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
- [Feb, 2024] Deploying and scaling **Gemma** with SkyServe: [**example**](./llm/gemma/)
- [Feb, 2024] Serving **Code Llama 70B** with vLLM and SkyServe: [**example**](./llm/codellama/)
- [Dec, 2023] **Mixtral 8x7B**, a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
- [Nov, 2023] Using **Axolotl** to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)

<details>
<summary>Archived</summary>

- [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
- [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
- [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
- [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
- [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
- [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
Expand Down Expand Up @@ -158,6 +158,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest
<!-- Keep this section in sync with index.rst in SkyPilot Docs -->
Runnable examples:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding our blog and community integration page link to the sentence above as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- LLMs on SkyPilot
- [Llama 3.2: lightweight and vision models](./llm/llama-3_2/)
- [Pixtral](./llm/pixtral/)
- [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/)
- [GPT-2 via `llm.c`](./llm/gpt-2/)
Expand Down Expand Up @@ -203,4 +204,4 @@ We are excited to hear your feedback!
For general discussions, join us on the [SkyPilot Slack](http://slack.skypilot.co).

## Contributing
We welcome and value all contributions to the project! Please refer to [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.
We welcome all contributions to the project! See [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.
14 changes: 7 additions & 7 deletions docs/source/_gallery_original/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,17 @@ Contents
:maxdepth: 1
:caption: LLM Models

Vision Llama 3.2 (Meta) <llms/llama-3_2>
Llama 3.1 (Meta) <llms/llama-3_1>
Llama 3 (Meta) <llms/llama-3>
Llama 2 (Meta) <llms/llama-2>
CodeLlama (Meta) <llms/codellama>
Pixtral (Mistral AI) <llms/pixtral>
Mixtral (Mistral AI) <llms/mixtral>
Mistral 7B (Mistral AI) <https://docs.mistral.ai/self-deployment/skypilot/>
DBRX (Databricks) <llms/dbrx>
Llama-2 (Meta) <llms/llama-2>
Llama-3 (Meta) <llms/llama-3>
Llama-3.1 (Meta) <llms/llama-3_1>
Vision Llama-3.2 (Meta) <llms/llama-3_2>
Qwen (Alibaba) <llms/qwen>
CodeLlama (Meta) <llms/codellama>
Qwen 2.5 (Alibaba) <llms/qwen>
Gemma (Google) <llms/gemma>
DBRX (Databricks) <llms/dbrx>

.. toctree::
:maxdepth: 1
Expand Down
4 changes: 2 additions & 2 deletions docs/source/_static/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ document.addEventListener('DOMContentLoaded', () => {
const newItems = [
{ selector: '.caption-text', text: 'SkyServe: Model Serving' },
{ selector: '.toctree-l1 > a', text: 'Managed Jobs' },
{ selector: '.toctree-l1 > a', text: 'Llama-3.1 (Meta)' },
{ selector: '.toctree-l1 > a', text: 'Pixtral (Mistral AI)' },
{ selector: '.toctree-l1 > a', text: 'Many Parallel Jobs' },
{ selector: '.toctree-l1 > a', text: 'Reserved, Capacity Blocks, DWS' },
{ selector: '.toctree-l1 > a', text: 'Llama-3.2 (Meta)' },
{ selector: '.toctree-l1 > a', text: 'Llama 3.2 (Meta)' },
{ selector: '.toctree-l1 > a', text: 'Admin Policy Enforcement' },
];
newItems.forEach(({ selector, text }) => {
document.querySelectorAll(selector).forEach((el) => {
Expand Down
6 changes: 2 additions & 4 deletions docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,27 +80,25 @@ Runnable examples:

* **LLMs on SkyPilot**

* `Llama 3.2: lightweight and vision models <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_2>`_
* `Pixtral <https://github.com/skypilot-org/skypilot/tree/master/llm/pixtral>`_
* `Llama 3.1 finetuning <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1-finetuning>`_ and `serving <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1>`_
* `GPT-2 via llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`_
* `Llama 3 <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3>`_
* `Qwen <https://github.com/skypilot-org/skypilot/tree/master/llm/qwen>`_
* `Databricks DBRX <https://github.com/skypilot-org/skypilot/tree/master/llm/dbrx>`_
* `Gemma <https://github.com/skypilot-org/skypilot/tree/master/llm/gemma>`_
* `Mixtral 8x7B <https://github.com/skypilot-org/skypilot/tree/master/llm/mixtral>`_; `Mistral 7B <https://docs.mistral.ai/self-deployment/skypilot>`_ (from official Mistral team)
* `Code Llama <https://github.com/skypilot-org/skypilot/tree/master/llm/codellama/>`_
* `vLLM: Serving LLM 24x Faster On the Cloud <https://github.com/skypilot-org/skypilot/tree/master/llm/vllm>`_ (from official vLLM team)
* `SGLang: Fast and Expressive LLM Serving On the Cloud <https://github.com/skypilot-org/skypilot/tree/master//llm/sglang/>`_ (from official SGLang team)
* `Vicuna chatbots: Training & Serving <https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna>`_ (from official Vicuna team)
* `Train your own Vicuna on Llama-2 <https://github.com/skypilot-org/skypilot/blob/master/llm/vicuna-llama-2>`_
* `Self-Hosted Llama-2 Chatbot <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-2>`_
* `Ollama: Quantized LLMs on CPUs <https://github.com/skypilot-org/skypilot/tree/master/llm/ollama>`_
* `LoRAX <https://github.com/skypilot-org/skypilot/tree/master/llm/lorax/>`_
* `QLoRA <https://github.com/artidoro/qlora/pull/132>`_
* `LLaMA-LoRA-Tuner <https://github.com/zetavg/LLaMA-LoRA-Tuner#run-on-a-cloud-service-via-skypilot>`_
* `Tabby: Self-hosted AI coding assistant <https://github.com/TabbyML/tabby/blob/bed723fcedb44a6b867ce22a7b1f03d2f3531c1e/experimental/eval/skypilot.yaml>`_
* `LocalGPT <https://github.com/skypilot-org/skypilot/tree/master/llm/localgpt>`_
* `Falcon <https://github.com/skypilot-org/skypilot/tree/master/llm/falcon>`_
* Add yours here & see more in `llm/ <https://github.com/skypilot-org/skypilot/tree/master/llm>`_!

* Framework examples: `PyTorch DDP <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml>`_, `DeepSpeed <https://github.com/skypilot-org/skypilot/blob/master/examples/deepspeed-multinode/sky.yaml>`_, `JAX/Flax on TPU <https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml>`_, `Stable Diffusion <https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion>`_, `Detectron2 <https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml>`_, `Distributed <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py>`_ `TensorFlow <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml>`_, `NeMo <https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo_gpt_train.yaml>`_, `programmatic grid search <https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py>`_, `Docker <https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml>`_, `Cog <https://github.com/skypilot-org/skypilot/blob/master/examples/cog/>`_, `Unsloth <https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml>`_, `Ollama <https://github.com/skypilot-org/skypilot/blob/master/llm/ollama>`_, `llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`__, `Airflow <https://github.com/skypilot-org/skypilot/blob/master/examples/airflow/training_workflow>`_ and `many more <https://github.com/skypilot-org/skypilot/tree/master/examples>`_.
Expand Down Expand Up @@ -202,7 +200,7 @@ Read the research:
../cloud-setup/cloud-auth
../cloud-setup/quota
../cloud-setup/policy

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
4 changes: 2 additions & 2 deletions llm/llama-2/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!-- $REMOVE -->
# Self-Hosted Llama-2 Chatbot on Any Cloud
# Self-Hosted Llama 2 Chatbot on Any Cloud
<!-- $END_REMOVE -->
<!-- $UNCOMMENT# Llama-2: Open LLM from Meta -->
<!-- $UNCOMMENT# Llama 2: Open LLM from Meta -->

[Llama-2](https://github.com/facebookresearch/llama/tree/main) is the top open-source models on the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) today. It has been released with a license that authorizes commercial use. You can deploy a private Llama-2 chatbot with SkyPilot in your own cloud with just one simple command.

Expand Down
4 changes: 2 additions & 2 deletions llm/llama-3/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!-- $REMOVE -->
# Scale Serving Llama-3 on Any Cloud or Kubernetes with SkyPilot
# Scale Serving Llama 3 on Any Cloud or Kubernetes with SkyPilot
<!-- $END_REMOVE -->
<!-- $UNCOMMENT# Llama-3: Open LLM from Meta -->
<!-- $UNCOMMENT# Llama 3: Open LLM from Meta -->


<p align="center">
Expand Down
58 changes: 29 additions & 29 deletions llm/llama-3_2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<!-- $REMOVE -->
# Point, Launch, and Serve Vision Llama 3.2 on Kubernetes or Any Cloud
<!-- $END_REMOVE -->
<!-- $UNCOMMENT# Vision Llama-3.2 (Meta) -->
<!-- $UNCOMMENT# Vision Llama 3.2 (Meta) -->


[Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) family was released by Meta on Sep 25, 2024. It not only includes the latest improved (and smaller) LLM models for chat, but also includes multimodal vision-language models. Let's _point and launch_ it with SkyPilot.
Expand Down Expand Up @@ -90,22 +90,22 @@ $ HF_TOKEN=xxx sky launch llama3_2.yaml -c llama3_2 --env HF_TOKEN
```console
...
------------------------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
------------------------------------------------------------------------------------------------------------------
Kubernetes 4CPU--16GB--1L4 4 16 L4:1 kubernetes 0.00 ✔
RunPod 1x_L4_SECURE 4 24 L4:1 CA 0.44
GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70
AWS g6.xlarge 4 16 L4:1 us-east-1 0.80
AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
Cudo sapphire-rapids-h100_1x4v8gb 4 8 H100:1 ca-montreal-3 2.86
Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
Azure Standard_NV36ads_A10_v5 36 440 A10:1 eastus 3.20
GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
Kubernetes 4CPU--16GB--1L4 4 16 L4:1 kubernetes 0.00 ✔
RunPod 1x_L4_SECURE 4 24 L4:1 CA 0.44
GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70
AWS g6.xlarge 4 16 L4:1 us-east-1 0.80
AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
Cudo sapphire-rapids-h100_1x4v8gb 4 8 H100:1 ca-montreal-3 2.86
Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
Azure Standard_NV36ads_A10_v5 36 440 A10:1 eastus 3.20
GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
------------------------------------------------------------------------------------------------------------------
```

Expand Down Expand Up @@ -185,20 +185,20 @@ $ HF_TOKEN=xxx sky launch llama3_2-vision-11b.yaml -c llama3_2-vision --env HF_T

```console
------------------------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
------------------------------------------------------------------------------------------------------------------
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 kubernetes 0.00 ✔
RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
RunPod 1x_A100-80GB_SECURE 8 80 A100-80GB:1 CA 1.99
Cudo sapphire-rapids-h100_1x2v4gb 2 4 H100:1 ca-montreal-3 2.83
Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 eastus 3.67
RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
GCP a2-ultragpu-1g 12 170 A100-80GB:1 us-central1-a 5.03
Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
Kubernetes 2CPU--8GB--1H100 2 8 H100:1 kubernetes 0.00 ✔
RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
RunPod 1x_A100-80GB_SECURE 8 80 A100-80GB:1 CA 1.99
Cudo sapphire-rapids-h100_1x2v4gb 2 4 H100:1 ca-montreal-3 2.83
Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 eastus 3.67
RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
GCP a2-ultragpu-1g 12 170 A100-80GB:1 us-central1-a 5.03
Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
------------------------------------------------------------------------------------------------------------------
```

Expand Down
Loading