From 0faece20591743baf9c1020fa13547dcc099ee7f Mon Sep 17 00:00:00 2001 From: Philip Kiely - Baseten <98474633+philipkiely-baseten@users.noreply.github.com> Date: Sun, 19 Mar 2023 03:23:08 +0000 Subject: [PATCH] update truss init doc --- docs/create/manual.md | 203 ++++++++++++++++++++++++++++++------------ 1 file changed, 146 insertions(+), 57 deletions(-) diff --git a/docs/create/manual.md b/docs/create/manual.md index d8b9fe70d..d2fa44908 100644 --- a/docs/create/manual.md +++ b/docs/create/manual.md @@ -1,93 +1,182 @@ # Manually -Creating a Truss manually, from a serialized model, works with any model-building framework, including from-scratch bespoke models. +You can package any model as a Truss. `truss.create()` is a convenient shortcut for packaging in-memory models built in supported frameworks, but the manual approach gives control and flexibility throughout the entire model packaging and deployment process. + +This doc walks through the process of manually creating a Truss, using Stable Diffusion v1.5 as an example. To get started, initialize the Truss with the following command in the CLI: ``` -truss init my_truss +truss init sd_truss ``` -### Truss structure +This will create the following file structure: -To build a Truss manually, you have to understand the package in much more detail than using it with a supported framework. Fortunately, that's what this doc is for! +``` +sd_truss/ # Truss root + data/ # Stores serialized models/weights/binaries + model/ # + __init__.py # + model.py # Implements Model class + packages/ # Stores utility code for model.py + config.yaml # Config for model serving environment + examples.yaml # Invocation examples +``` -To familiarize yourself with the structure of Truss, review the [structure reference](../reference/structure.md). A Truss only has a few files that you need to interact with, and this tutorial is an opinionated guide to working through them. +Most of our development work will happen in `models/model.py` and `config.yaml`. -### Adding the model binary +### Loading your model -First, you'll need to add a model binary to your new Truss. On supported frameworks, this is provided automatically by the `create` command. For a custom Truss, it can come from many sources, such as: +In `models/model.py`, the first function you'll need to implement is `load()`. -* Pickling your model -* Serializing your model -* Downloading a serialized model from the internet +When the model is spun up to receive requests, `load()` is called exactly once and is guaranteed to finish before any predictions are attempted. -This file should be put in the folder `data/model/` as, for example, `model.joblib` (replace `joblib` with the appropriate extension for your serialized model). +The purpose of `load()` is to set a value for `self._model`. This requires deserializing your model or otherwise loading in your model weights. -This model binary must be de-serialized in the model class. +**Example: Stable Diffusion 1.5** -### Building the model +The exact code you'll need will depend on your model and framework. In this example, model weights for Stable Diffusion 1.5 are coming from the HuggingFace `diffusors` package. -The model file implements the following functions, in order of execution: +This requires a couple of imports (don't worry, we'll cover adding Python requirements in a bit). -* A constructor `__init__` to initiate the class -* A function called `load`, called **only** once, and that call is guaranteed to happen before **any** predictions are run -* A function `preprocess`, called once before **each** prediction -* A function `predict` that actually runs the model to make a prediction -* A function `postprocess`, called once after **each** prediction +```python +from dataclasses import asdict +from typing import Dict -Having both a constructor and a load function means you have flexibility on when you download and/or deserialize your model. There are three possibilities here, and we strongly recommend the first one: +import torch +from diffusers import EulerDiscreteScheduler, StableDiffusionPipeline +``` -1. Load in the load function -2. Load model in the constructor, but it's not a good idea to block constructor -3. Load lazily on first prediction, but this gives your model service a cold start issue +The load function looks like: + +```python +def load(self): + scheduler = EulerDiscreteScheduler.from_pretrained( + "runwayml/stable-diffusion-v1-5", + subfolder="scheduler", + ) + self._model = StableDiffusionPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", + scheduler=scheduler, + torch_dtype=torch.float16, + ) + self._model.unet.set_use_memory_efficient_attention_xformers(True) + self._model = self._model.to("cuda") +``` + +`self._model` could be set using weights from anywhere. If you have custom weights, you can load them from your Truss' `data/` directory by [following this guide](https://github.com/basetenlabs/truss/blob/main/examples/stable-diffusion-1-5/data/README.md +). -Also, your model gets access to certain values, including the `config.yaml` file for configuration and the `data` folder where you previously put the serialized model. -## Example code +### Implement model invocation -While XGBoost is a supported framework — you can make a Truss from an XGBoost model with `create` — we'll use the manual method here for demonstration. +The other key function in your Truss is `predict()`, which handles model invocation. -If you haven't already, create a Truss by running: +As our loaded model is a `StableDiffusionPipeline` object, model invocation is pretty simple: +```python +def predict(self, model_input: Dict): + response = self._model(**model_input) + return response ``` -truss init my_truss + +All we have to do is pass the model input to the model. + +But how do we make sure the model input is a valid format, and that the model output is usable? + +### Implement processing functions + +By default, pre- and post-processing functions are passthroughs. But if needed, you can implement these functions to make your model input and output match the specification of whatever app or API you're building. + +There are [more in-depth docs on processing functions here](../develop/processing.md), but here's sample code for the Stable Diffusion example, which needs a postprocessing function but not a pre-processing function: + +```python +def postprocess(self, model_output: Dict) -> Dict: + # Convert to base64 + model_output["images"] = [pil_to_b64(img) for img in model_output["images"]] + return asdict(model_output) ``` -This is the part you want to replace with your own code. Build a machine learning model and keep it in-memory. +Eagle-eyed readers will note that `pil_to_b64()` is not a function that has been defined anywhere. How can we use it? + +### Call upon shared packages + +Here's that `pil_to_b64()` function from the last step: ```python -import xgboost as xgb -from sklearn.datasets import make_classification -from sklearn.model_selection import train_test_split - -def create_data(): - X, y = make_classification(n_samples=100, - n_informative=5, - n_classes=2) - X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) - train = xgb.DMatrix(X_train, y_train) - test = xgb.DMatrix(X_test, y_test) - return train, test - -train, test = create_data() - -params = { - "learning_rate": 0.01, - "max_depth": 3 -} - -# training, we set the early stopping rounds parameter -model = xgb.train(params, - train, evals=[(train, "train"), (test, "validation")], - num_boost_round=100, early_stopping_rounds=20) +import base64 +from io import BytesIO + +from PIL import Image + +def pil_to_b64(pil_img): + buffered = BytesIO() + pil_img.save(buffered, format="PNG") + img_str = base64.b64encode(buffered.getvalue()) + return "data:image/png;base64," + str(img_str)[2:-1] ``` -Now, we'll serialize and save the model: +You could just paste this into `models/model.py` and call it a day. But its better to factor out helper functions and utilities so that they can be re-used between multiple Trusses. + +Let's create a folder `shared` at the same level as our root `sd_truss` directory (don't create it inside the Truss directory). Then create a file `shared/base64_utils.py`. It should look like this: + +``` +shared/ + base64_utils.py +sd_truss/ + ... +``` + +Paste the code from above into `shared/base64_utils.py`. + +Let your Truss know where to look for external packages with the following lines in `config.yaml`: + +```yaml +external_package_dirs: +- ../shared/ +``` + +Note that this is an array in yaml; your Truss can depend on multiple external directories for packages. + +Finally, at the top of `sd_truss/models/model.py`, add: ```python -import os -model.save_model(os.path.join("my_truss", "data", "model", "xgboost.json")) +from base64_utils import pil_to_b64 +``` + +This will import your function from your external directory. + +For more details on bundled and shared packages, see [this demo repository](https://github.com/bolasim/truss-packages-example) and the [bundled packages docs](../develop/bundled-packages.md). + +### Set Python and system requirements + +Now, we switch our attention to `config.yaml`. You can use this file to customize a great deal about your packaged model — [here's a complete reference](../develop/configuration.md) — but right now we just care about setting our Python requirements up so the model can run. + +For that, find `requirements:` in the config file. In the Stable Diffusion 1.5 example, we set it to: + +```yaml +requirements: +- diffusers +- transformers +- accelerate +- scipy +- safetensors +- xformers +- triton +``` + +These requirements work just like `requirements.txt` in a Python project, and you can pin versions with `package==1.2.3`. + +### Set hardware requirements + +Large models like Stable Diffusion require powerful hardware to run invocations. Set your packaged model's hardware requirements in `config.yaml`: + +```yaml +resources: + accelerator: A10G # Type of GPU required + cpu: "8" # Number of vCPU cores required + memory: 30Gi # Mibibytes (Mi) or Gibibytes (Gi) of RAM required + use_gpu: true # If false, set accelerator: null ``` -Once your model is created, you'll likely need to develop it further, see the next section for everything you need to know about local development! +You've successfully packaged a model! If you have the required hardware, you can [test it locally](../develop/localhost.md), or [deploy it to Baseten](https://docs.baseten.co/models/deploying-models/client#stage-2-deploying-a-draft) to get a draft model for rapid iteration in a production environment.