skypilot-org · concretevitamin · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024
diff --git a/README.md b/README.md
@@ -26,27 +26,27 @@
 
 ----
 :fire: *News* :fire:
-- [Sep, 2024] Point, Lanuch and Serve **Llama 3.2** on on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
-- [Sep, 2024] Run and deploy [Pixtral](./llm/pixtral), the first open-source multimodal model from Mistral AI.
-- [Jul, 2024] [Finetune](./llm/llama-3_1-finetuning/) and [serve](./llm/llama-3_1/) **Llama 3.1** on your infra
+- [Sep, 2024] Point, Launch and Serve **Llama 3.2** on on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
+- [Sep, 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
+- [Jul, 2024] [**Finetune**](./llm/llama-3_1-finetuning/) and [**serve**](./llm/llama-3_1/) **Llama 3.1** on your infra
 - [Jun, 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
-- [Apr, 2024] Serve [**Qwen-110B**](https://qwenlm.github.io/blog/qwen1.5-110b/) on your infra: [**example**](./llm/qwen/)
-- [Apr, 2024] Using [**Ollama**](https://github.com/ollama/ollama) to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
-- [Feb, 2024] Deploying and scaling [**Gemma**](https://blog.google/technology/developers/gemma-open-models/) with SkyServe: [**example**](./llm/gemma/)
-- [Feb, 2024] Serving [**Code Llama 70B**](https://ai.meta.com/blog/code-llama-large-language-model-coding/) with vLLM and SkyServe: [**example**](./llm/codellama/)
-- [Dec, 2023] [**Mixtral 8x7B**](https://mistral.ai/news/mixtral-of-experts/), a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
-- [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
-- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
-- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
+- [Apr, 2024] Serve **Qwen-110B** on your infra: [**example**](./llm/qwen/)
+- [Apr, 2024] Using **Ollama** to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
+- [Feb, 2024] Deploying and scaling **Gemma** with SkyServe: [**example**](./llm/gemma/)
+- [Feb, 2024] Serving **Code Llama 70B** with vLLM and SkyServe: [**example**](./llm/codellama/)
+- [Dec, 2023] **Mixtral 8x7B**, a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
+- [Nov, 2023] Using **Axolotl** to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
 
 <details>
  <summary>Archived</summary>
- 
+
 - [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
 - [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
 - [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
 - [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
 - [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
+- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
+- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
 - [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
 - [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
 - [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
@@ -158,6 +158,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest
 <!-- Keep this section in sync with index.rst in SkyPilot Docs -->
 Runnable examples:
 - LLMs on SkyPilot
+ - [Llama 3.2: lightweight and vision models](./llm/llama-3_2/)
  - [Pixtral](./llm/pixtral/)
  - [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/)
  - [GPT-2 via `llm.c`](./llm/gpt-2/)
@@ -203,4 +204,4 @@ We are excited to hear your feedback!
 For general discussions, join us on the [SkyPilot Slack](http://slack.skypilot.co).
 
 ## Contributing
-We welcome and value all contributions to the project! Please refer to [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.
+We welcome all contributions to the project! See [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.
diff --git a/docs/source/_gallery_original/index.rst b/docs/source/_gallery_original/index.rst
@@ -34,17 +34,17 @@ Contents
  :maxdepth: 1
  :caption: LLM Models
 
+ Vision Llama 3.2 (Meta) <llms/llama-3_2>
+ Llama 3.1 (Meta) <llms/llama-3_1>
+ Llama 3 (Meta) <llms/llama-3>
+ Llama 2 (Meta) <llms/llama-2>
+ CodeLlama (Meta) <llms/codellama>
  Pixtral (Mistral AI) <llms/pixtral>
  Mixtral (Mistral AI) <llms/mixtral>
  Mistral 7B (Mistral AI) <https://docs.mistral.ai/self-deployment/skypilot/>
- DBRX (Databricks) <llms/dbrx>
- Llama-2 (Meta) <llms/llama-2>
- Llama-3 (Meta) <llms/llama-3>
- Llama-3.1 (Meta) <llms/llama-3_1>
- Vision Llama-3.2 (Meta) <llms/llama-3_2>
- Qwen (Alibaba) <llms/qwen>
- CodeLlama (Meta) <llms/codellama>
+ Qwen 2.5 (Alibaba) <llms/qwen>
  Gemma (Google) <llms/gemma>
+ DBRX (Databricks) <llms/dbrx>
 
 .. toctree::
  :maxdepth: 1

diff --git a/docs/source/_static/custom.js b/docs/source/_static/custom.js
@@ -27,11 +27,11 @@ document.addEventListener('DOMContentLoaded', () => {
  const newItems = [
  { selector: '.caption-text', text: 'SkyServe: Model Serving' },
  { selector: '.toctree-l1 > a', text: 'Managed Jobs' },
- { selector: '.toctree-l1 > a', text: 'Llama-3.1 (Meta)' },
  { selector: '.toctree-l1 > a', text: 'Pixtral (Mistral AI)' },
  { selector: '.toctree-l1 > a', text: 'Many Parallel Jobs' },
  { selector: '.toctree-l1 > a', text: 'Reserved, Capacity Blocks, DWS' },
- { selector: '.toctree-l1 > a', text: 'Llama-3.2 (Meta)' },
+ { selector: '.toctree-l1 > a', text: 'Llama 3.2 (Meta)' },
+ { selector: '.toctree-l1 > a', text: 'Admin Policy Enforcement' },
  ];
  newItems.forEach(({ selector, text }) => {
  document.querySelectorAll(selector).forEach((el) => {

diff --git a/docs/source/docs/index.rst b/docs/source/docs/index.rst
@@ -80,27 +80,25 @@ Runnable examples:
 
 * **LLMs on SkyPilot**
 
+ * `Llama 3.2: lightweight and vision models <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_2>`_
  * `Pixtral <https://github.com/skypilot-org/skypilot/tree/master/llm/pixtral>`_
  * `Llama 3.1 finetuning <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1-finetuning>`_ and `serving <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1>`_
  * `GPT-2 via llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`_
  * `Llama 3 <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3>`_
  * `Qwen <https://github.com/skypilot-org/skypilot/tree/master/llm/qwen>`_
- * `Databricks DBRX <https://github.com/skypilot-org/skypilot/tree/master/llm/dbrx>`_
  * `Gemma <https://github.com/skypilot-org/skypilot/tree/master/llm/gemma>`_
  * `Mixtral 8x7B <https://github.com/skypilot-org/skypilot/tree/master/llm/mixtral>`_; `Mistral 7B <https://docs.mistral.ai/self-deployment/skypilot>`_ (from official Mistral team)
  * `Code Llama <https://github.com/skypilot-org/skypilot/tree/master/llm/codellama/>`_
  * `vLLM: Serving LLM 24x Faster On the Cloud <https://github.com/skypilot-org/skypilot/tree/master/llm/vllm>`_ (from official vLLM team)
  * `SGLang: Fast and Expressive LLM Serving On the Cloud <https://github.com/skypilot-org/skypilot/tree/master//llm/sglang/>`_ (from official SGLang team)
  * `Vicuna chatbots: Training & Serving <https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna>`_ (from official Vicuna team)
  * `Train your own Vicuna on Llama-2 <https://github.com/skypilot-org/skypilot/blob/master/llm/vicuna-llama-2>`_
- * `Self-Hosted Llama-2 Chatbot <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-2>`_
  * `Ollama: Quantized LLMs on CPUs <https://github.com/skypilot-org/skypilot/tree/master/llm/ollama>`_
  * `LoRAX <https://github.com/skypilot-org/skypilot/tree/master/llm/lorax/>`_
  * `QLoRA <https://github.com/artidoro/qlora/pull/132>`_
  * `LLaMA-LoRA-Tuner <https://github.com/zetavg/LLaMA-LoRA-Tuner#run-on-a-cloud-service-via-skypilot>`_
  * `Tabby: Self-hosted AI coding assistant <https://github.com/TabbyML/tabby/blob/bed723fcedb44a6b867ce22a7b1f03d2f3531c1e/experimental/eval/skypilot.yaml>`_
  * `LocalGPT <https://github.com/skypilot-org/skypilot/tree/master/llm/localgpt>`_
- * `Falcon <https://github.com/skypilot-org/skypilot/tree/master/llm/falcon>`_
  * Add yours here & see more in `llm/ <https://github.com/skypilot-org/skypilot/tree/master/llm>`_!
 
 * Framework examples: `PyTorch DDP <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml>`_, `DeepSpeed <https://github.com/skypilot-org/skypilot/blob/master/examples/deepspeed-multinode/sky.yaml>`_, `JAX/Flax on TPU <https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml>`_, `Stable Diffusion <https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion>`_, `Detectron2 <https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml>`_, `Distributed <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py>`_ `TensorFlow <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml>`_, `NeMo <https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo_gpt_train.yaml>`_, `programmatic grid search <https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py>`_, `Docker <https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml>`_, `Cog <https://github.com/skypilot-org/skypilot/blob/master/examples/cog/>`_, `Unsloth <https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml>`_, `Ollama <https://github.com/skypilot-org/skypilot/blob/master/llm/ollama>`_, `llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`__, `Airflow <https://github.com/skypilot-org/skypilot/blob/master/examples/airflow/training_workflow>`_ and `many more <https://github.com/skypilot-org/skypilot/tree/master/examples>`_.
@@ -202,7 +200,7 @@ Read the research:
  ../cloud-setup/cloud-auth
  ../cloud-setup/quota
  ../cloud-setup/policy
- 
+
 .. toctree::
  :hidden:
  :maxdepth: 1

diff --git a/llm/llama-2/README.md b/llm/llama-2/README.md
@@ -1,7 +1,7 @@
 <!-- $REMOVE -->
-# Self-Hosted Llama-2 Chatbot on Any Cloud
+# Self-Hosted Llama 2 Chatbot on Any Cloud
 <!-- $END_REMOVE -->
-<!-- $UNCOMMENT# Llama-2: Open LLM from Meta -->
+<!-- $UNCOMMENT# Llama 2: Open LLM from Meta -->
 
 [Llama-2](https://github.com/facebookresearch/llama/tree/main) is the top open-source models on the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) today. It has been released with a license that authorizes commercial use. You can deploy a private Llama-2 chatbot with SkyPilot in your own cloud with just one simple command.
 

diff --git a/llm/llama-3/README.md b/llm/llama-3/README.md
@@ -1,7 +1,7 @@
 <!-- $REMOVE -->
-# Scale Serving Llama-3 on Any Cloud or Kubernetes with SkyPilot
+# Scale Serving Llama 3 on Any Cloud or Kubernetes with SkyPilot
 <!-- $END_REMOVE -->
-<!-- $UNCOMMENT# Llama-3: Open LLM from Meta -->
+<!-- $UNCOMMENT# Llama 3: Open LLM from Meta -->
 
 
 <p align="center">

diff --git a/llm/llama-3_2/README.md b/llm/llama-3_2/README.md
@@ -2,7 +2,7 @@
 <!-- $REMOVE -->
 # Point, Launch, and Serve Vision Llama 3.2 on Kubernetes or Any Cloud
 <!-- $END_REMOVE -->
-<!-- $UNCOMMENT# Vision Llama-3.2 (Meta) -->
+<!-- $UNCOMMENT# Vision Llama 3.2 (Meta) -->
 
 
 [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) family was released by Meta on Sep 25, 2024. It not only includes the latest improved (and smaller) LLM models for chat, but also includes multimodal vision-language models. Let's _point and launch_ it with SkyPilot.
@@ -90,22 +90,22 @@ $ HF_TOKEN=xxx sky launch llama3_2.yaml -c llama3_2 --env HF_TOKEN
 ```console
 ...
 ------------------------------------------------------------------------------------------------------------------
- CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN 
+ CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
 ------------------------------------------------------------------------------------------------------------------
- Kubernetes 4CPU--16GB--1L4 4 16 L4:1 kubernetes 0.00 ✔ 
- RunPod 1x_L4_SECURE 4 24 L4:1 CA 0.44 
- GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70 
- AWS g6.xlarge 4 16 L4:1 us-east-1 0.80 
- AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01 
- RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14 
- Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15 
- AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86 
- Cudo sapphire-rapids-h100_1x4v8gb 4 8 H100:1 ca-montreal-3 2.86 
- Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89 
- Azure Standard_NV36ads_A10_v5 36 440 A10:1 eastus 3.20 
- GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67 
- RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49 
- Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98 
+ Kubernetes 4CPU--16GB--1L4 4 16 L4:1 kubernetes 0.00 ✔
+ RunPod 1x_L4_SECURE 4 24 L4:1 CA 0.44
+ GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70
+ AWS g6.xlarge 4 16 L4:1 us-east-1 0.80
+ AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
+ RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
+ Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
+ AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
+ Cudo sapphire-rapids-h100_1x4v8gb 4 8 H100:1 ca-montreal-3 2.86
+ Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
+ Azure Standard_NV36ads_A10_v5 36 440 A10:1 eastus 3.20
+ GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
+ RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
+ Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
 ------------------------------------------------------------------------------------------------------------------
 ```
 
@@ -185,20 +185,20 @@ $ HF_TOKEN=xxx sky launch llama3_2-vision-11b.yaml -c llama3_2-vision --env HF_T
 
 ```console
 ------------------------------------------------------------------------------------------------------------------
- CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN 
+ CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
 ------------------------------------------------------------------------------------------------------------------
- Kubernetes 2CPU--8GB--1H100 2 8 H100:1 kubernetes 0.00 ✔ 
- RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14 
- Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15 
- AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86 
- RunPod 1x_A100-80GB_SECURE 8 80 A100-80GB:1 CA 1.99 
- Cudo sapphire-rapids-h100_1x2v4gb 2 4 H100:1 ca-montreal-3 2.83 
- Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89 
- GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67 
- Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 eastus 3.67 
- RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49 
- GCP a2-ultragpu-1g 12 170 A100-80GB:1 us-central1-a 5.03 
- Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98 
+ Kubernetes 2CPU--8GB--1H100 2 8 H100:1 kubernetes 0.00 ✔
+ RunPod 1x_L40_SECURE 16 48 L40:1 CA 1.14
+ Fluidstack L40_48GB::1 32 60 L40:1 CANADA 1.15
+ AWS g6e.xlarge 4 32 L40S:1 us-east-1 1.86
+ RunPod 1x_A100-80GB_SECURE 8 80 A100-80GB:1 CA 1.99
+ Cudo sapphire-rapids-h100_1x2v4gb 2 4 H100:1 ca-montreal-3 2.83
+ Fluidstack H100_PCIE_80GB::1 28 180 H100:1 CANADA 2.89
+ GCP a2-highgpu-1g 12 85 A100:1 us-central1-a 3.67
+ Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 eastus 3.67
+ RunPod 1x_H100_SECURE 16 80 H100:1 CA 4.49
+ GCP a2-ultragpu-1g 12 170 A100-80GB:1 us-central1-a 5.03
+ Azure Standard_NC40ads_H100_v5 40 320 H100:1 eastus 6.98
 ------------------------------------------------------------------------------------------------------------------
 ```