Skip to content

Commit

Permalink
merge with mainline
Browse files Browse the repository at this point in the history
  • Loading branch information
asaiacai committed Sep 14, 2024
2 parents 22f8ebf + c1464e1 commit 910d355
Show file tree
Hide file tree
Showing 83 changed files with 2,236 additions and 525 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

----
:fire: *News* :fire:
- [Sep, 2024] Run and deploy [Pixtral](./llm/pixtral), the first open-source multimodal model from Mistral AI.
- [Jul, 2024] [Finetune](./llm/llama-3_1-finetuning/) and [serve](./llm/llama-3_1/) **Llama 3.1** on your infra
- [Jun, 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
- [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
Expand Down Expand Up @@ -156,6 +157,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest
<!-- Keep this section in sync with index.rst in SkyPilot Docs -->
Runnable examples:
- LLMs on SkyPilot
- [Pixtral](./llm/pixtral/)
- [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/)
- [GPT-2 via `llm.c`](./llm/gpt-2/)
- [Llama 3](./llm/llama-3/)
Expand Down
1 change: 1 addition & 0 deletions docs/source/_gallery_original/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Contents
:maxdepth: 1
:caption: LLM Models

Pixtral (Mistral AI) <llms/pixtral>
Mixtral (Mistral AI) <llms/mixtral>
Mistral 7B (Mistral AI) <https://docs.mistral.ai/self-deployment/skypilot/>
DBRX (Databricks) <llms/dbrx>
Expand Down
1 change: 1 addition & 0 deletions docs/source/_gallery_original/llms/pixtral.md
2 changes: 1 addition & 1 deletion docs/source/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ html[data-theme="dark"] {
padding: 2px 5px; /* Reduced padding for a more compact label */
margin-left: 6px; /* Space between the text and the label */

vertical-align: middle;
vertical-align: text-bottom;
line-height: 1; /* Adjust line height to ensure vertical alignment */
}

Expand Down
4 changes: 3 additions & 1 deletion docs/source/_static/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ document.addEventListener('DOMContentLoaded', () => {
const newItems = [
{ selector: '.caption-text', text: 'SkyServe: Model Serving' },
{ selector: '.toctree-l1 > a', text: 'Managed Jobs' },
{ selector: '.toctree-l1 > a', text: 'Running on Kubernetes' },
{ selector: '.toctree-l1 > a', text: 'Llama-3.1 (Meta)' },
{ selector: '.toctree-l1 > a', text: 'Pixtral (Mistral AI)' },
{ selector: '.toctree-l1 > a', text: 'Many Parallel Jobs' },
{ selector: '.toctree-l1 > a', text: 'Reserved, Capacity Blocks, DWS' },
];
newItems.forEach(({ selector, text }) => {
document.querySelectorAll(selector).forEach((el) => {
Expand Down
8 changes: 8 additions & 0 deletions docs/source/developers/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Developer Guides
=================

.. toctree::
:maxdepth: 1

../developers/CONTRIBUTING
Guide: Adding a New Cloud <https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?usp=sharing>
22 changes: 12 additions & 10 deletions docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ Runnable examples:
* **LLMs on SkyPilot**

* `Pixtral <https://github.com/skypilot-org/skypilot/tree/master/llm/pixtral>`_
* `Llama 3.1 finetuning <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1-finetuning>`_ and `serving <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3_1>`_
* `GPT-2 via llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`_
* `Llama 3 <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3>`_
Expand Down Expand Up @@ -129,8 +130,8 @@ Read the research:

../getting-started/installation
../getting-started/quickstart
../getting-started/tutorial
../examples/interactive-development
../getting-started/tutorial


.. toctree::
Expand All @@ -141,8 +142,16 @@ Read the research:
../examples/managed-jobs
../reference/job-queue
../examples/auto-failover
../reference/kubernetes/index
../running-jobs/distributed-jobs
../running-jobs/many-jobs

.. toctree::
:hidden:
:maxdepth: 1
:caption: Reserved & Existing Clusters

../reservations/reservations
../reference/kubernetes/index

.. toctree::
:hidden:
Expand Down Expand Up @@ -184,14 +193,6 @@ Read the research:
SkyPilot vs. Other Systems <../reference/comparison>


.. toctree::
:hidden:
:maxdepth: 1
:caption: Developer Guides

../developers/CONTRIBUTING
Guide: Adding a New Cloud <https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?usp=sharing>

.. toctree::
:hidden:
:maxdepth: 1
Expand All @@ -210,4 +211,5 @@ Read the research:
../reference/cli
../reference/api
../reference/config
../developers/index

2 changes: 1 addition & 1 deletion docs/source/examples/auto-failover.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ If a task would like to specify multiple candidate resources (not only GPUs), th

The regions specified that does not have the accelerator will be ignored automatically.

This will genereate the following output:
This will generate the following output:

.. code-block:: console
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting-started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ Congratulations! In this quickstart, you have launched a cluster, run a task, a

Next steps:

- Adapt :ref:`Tutorial: DNN Training <dnn-training>` to start running your own project on SkyPilot!
- Adapt :ref:`Tutorial: AI Training <ai-training>` to start running your own project on SkyPilot!
- See the :ref:`Task YAML reference <yaml-spec>`, :ref:`CLI reference <cli>`, and `more examples <https://github.com/skypilot-org/skypilot/tree/master/examples>`_
- To learn more, try out `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_ in Jupyter notebooks

Expand Down
4 changes: 2 additions & 2 deletions docs/source/getting-started/tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _dnn-training:
.. _ai-training:

Tutorial: DNN Training
Tutorial: AI Training
======================
This example uses SkyPilot to train a Transformer-based language model from HuggingFace.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Available fields and semantics:
# Advanced AWS configurations (optional).
# Apply to all new instances but not existing ones.
aws:
# Tags to assign to all instances launched by SkyPilot (optional).
# Tags to assign to all instances and buckets created by SkyPilot (optional).
#
# Example use case: cost tracking by user/team/project.
#
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/job-queue.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ SkyPilot's scheduler serves two goals:
2. **Minimizing resource idleness**: If a resource is idle, SkyPilot will schedule a
queued job that can utilize that resource.

We illustrate the scheduling behavior by revisiting :ref:`Tutorial: DNN Training <dnn-training>`.
We illustrate the scheduling behavior by revisiting :ref:`Tutorial: AI Training <ai-training>`.
In that tutorial, we have a task YAML that specifies these resource requirements:

.. code-block:: yaml
Expand Down
6 changes: 3 additions & 3 deletions docs/source/reference/kubernetes/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _kubernetes-overview:

Running on Kubernetes
=============================
Using Kubernetes
================

SkyPilot tasks can be run on your private on-prem or cloud Kubernetes clusters.
The Kubernetes cluster gets added to the list of "clouds" in SkyPilot and SkyPilot
Expand Down Expand Up @@ -116,4 +116,4 @@ Kubernetes support is under active development. Some features are in progress an
* Multi-node tasks - ✅ Available
* Custom images - ✅ Available
* Opening ports and exposing services - ✅ Available
* Multiple Kubernetes Clusters - 🚧 In progress
* Multiple Kubernetes Clusters - 🚧 In progress
20 changes: 19 additions & 1 deletion docs/source/reference/kubernetes/kubernetes-deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ Below we include minimal guides to set up a new Kubernetes cluster in different

Amazon's hosted Kubernetes service.

.. grid-item-card:: On-demand Cloud VMs
:link: kubernetes-setup-ondemand
:link-type: ref
:text-align: center

We provide scripts to deploy k8s on on-demand cloud VMs.

.. _kubernetes-setup-kind:


Expand Down Expand Up @@ -267,4 +274,15 @@ After the GPU operator is installed, create the nvidia RuntimeClass required by
metadata:
name: nvidia
handler: nvidia
EOF
EOF
.. _kubernetes-setup-ondemand:

Deploying on cloud VMs
^^^^^^^^^^^^^^^^^^^^^^

You can also spin up on-demand cloud VMs and deploy Kubernetes on them.

We provide scripts to take care of provisioning VMs, installing Kubernetes, setting up GPU support and configuring your local kubeconfig.
Refer to our `Deploying Kubernetes on VMs guide <https://github.com/skypilot-org/skypilot/tree/master/examples/k8s_cloud_deploy>`_ for more details.
14 changes: 8 additions & 6 deletions docs/source/reference/yaml-spec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,14 @@ Available fields:
disk_size: 256
# Disk tier to use for OS (optional).
# Could be one of 'low', 'medium', 'high' or 'best' (default: 'medium').
# Could be one of 'low', 'medium', 'high', 'ultra' or 'best' (default: 'medium').
# if 'best' is specified, use the best disk tier enabled.
# Rough performance estimate:
# low: 500 IOPS; read 20MB/s; write 40 MB/s
# medium: 3000 IOPS; read 220 MB/s; write 200 MB/s
# high: 6000 IOPS; 340 MB/s; write 250 MB/s
# low: 1000 IOPS; read 90 MB/s; write 90 MB/s
# medium: 3000 IOPS; read 220 MB/s; write 220 MB/s
# high: 6000 IOPS; read 400 MB/s; write 400 MB/s
# ultra: 60000 IOPS; read 4000 MB/s; write 3000 MB/s
# Measured by examples/perf/storage_rawperf.yaml
disk_tier: medium
# Ports to expose (optional).
Expand Down Expand Up @@ -335,8 +337,8 @@ Available fields:
.. _task-yaml-experimental:

Experimental
------------
Experimental Configurations
---------------------------

.. note::

Expand Down
Loading

0 comments on commit 910d355

Please sign in to comment.