-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8432ec5
commit b2f9648
Showing
1 changed file
with
146 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,93 +1,182 @@ | ||
# Manually | ||
|
||
Creating a Truss manually, from a serialized model, works with any model-building framework, including from-scratch bespoke models. | ||
You can package any model as a Truss. `truss.create()` is a convenient shortcut for packaging in-memory models built in supported frameworks, but the manual approach gives control and flexibility throughout the entire model packaging and deployment process. | ||
|
||
This doc walks through the process of manually creating a Truss, using Stable Diffusion v1.5 as an example. | ||
|
||
To get started, initialize the Truss with the following command in the CLI: | ||
|
||
``` | ||
truss init my_truss | ||
truss init sd_truss | ||
``` | ||
|
||
### Truss structure | ||
This will create the following file structure: | ||
|
||
To build a Truss manually, you have to understand the package in much more detail than using it with a supported framework. Fortunately, that's what this doc is for! | ||
``` | ||
sd_truss/ # Truss root | ||
data/ # Stores serialized models/weights/binaries | ||
model/ # | ||
__init__.py # | ||
model.py # Implements Model class | ||
packages/ # Stores utility code for model.py | ||
config.yaml # Config for model serving environment | ||
examples.yaml # Invocation examples | ||
``` | ||
|
||
To familiarize yourself with the structure of Truss, review the [structure reference](../reference/structure.md). A Truss only has a few files that you need to interact with, and this tutorial is an opinionated guide to working through them. | ||
Most of our development work will happen in `models/model.py` and `config.yaml`. | ||
|
||
### Adding the model binary | ||
### Loading your model | ||
|
||
First, you'll need to add a model binary to your new Truss. On supported frameworks, this is provided automatically by the `create` command. For a custom Truss, it can come from many sources, such as: | ||
In `models/model.py`, the first function you'll need to implement is `load()`. | ||
|
||
* Pickling your model | ||
* Serializing your model | ||
* Downloading a serialized model from the internet | ||
When the model is spun up to receive requests, `load()` is called exactly once and is guaranteed to finish before any predictions are attempted. | ||
|
||
This file should be put in the folder `data/model/` as, for example, `model.joblib` (replace `joblib` with the appropriate extension for your serialized model). | ||
The purpose of `load()` is to set a value for `self._model`. This requires deserializing your model or otherwise loading in your model weights. | ||
|
||
This model binary must be de-serialized in the model class. | ||
**Example: Stable Diffusion 1.5** | ||
|
||
### Building the model | ||
The exact code you'll need will depend on your model and framework. In this example, model weights for Stable Diffusion 1.5 are coming from the HuggingFace `diffusors` package. | ||
|
||
The model file implements the following functions, in order of execution: | ||
This requires a couple of imports (don't worry, we'll cover adding Python requirements in a bit). | ||
|
||
* A constructor `__init__` to initiate the class | ||
* A function called `load`, called **only** once, and that call is guaranteed to happen before **any** predictions are run | ||
* A function `preprocess`, called once before **each** prediction | ||
* A function `predict` that actually runs the model to make a prediction | ||
* A function `postprocess`, called once after **each** prediction | ||
```python | ||
from dataclasses import asdict | ||
from typing import Dict | ||
|
||
Having both a constructor and a load function means you have flexibility on when you download and/or deserialize your model. There are three possibilities here, and we strongly recommend the first one: | ||
import torch | ||
from diffusers import EulerDiscreteScheduler, StableDiffusionPipeline | ||
``` | ||
|
||
1. Load in the load function | ||
2. Load model in the constructor, but it's not a good idea to block constructor | ||
3. Load lazily on first prediction, but this gives your model service a cold start issue | ||
The load function looks like: | ||
|
||
```python | ||
def load(self): | ||
scheduler = EulerDiscreteScheduler.from_pretrained( | ||
"runwayml/stable-diffusion-v1-5", | ||
subfolder="scheduler", | ||
) | ||
self._model = StableDiffusionPipeline.from_pretrained( | ||
"runwayml/stable-diffusion-v1-5", | ||
scheduler=scheduler, | ||
torch_dtype=torch.float16, | ||
) | ||
self._model.unet.set_use_memory_efficient_attention_xformers(True) | ||
self._model = self._model.to("cuda") | ||
``` | ||
|
||
`self._model` could be set using weights from anywhere. If you have custom weights, you can load them from your Truss' `data/` directory by [following this guide](https://github.com/basetenlabs/truss/blob/main/examples/stable-diffusion-1-5/data/README.md | ||
). | ||
|
||
Also, your model gets access to certain values, including the `config.yaml` file for configuration and the `data` folder where you previously put the serialized model. | ||
|
||
## Example code | ||
### Implement model invocation | ||
|
||
While XGBoost is a supported framework — you can make a Truss from an XGBoost model with `create` — we'll use the manual method here for demonstration. | ||
The other key function in your Truss is `predict()`, which handles model invocation. | ||
|
||
If you haven't already, create a Truss by running: | ||
As our loaded model is a `StableDiffusionPipeline` object, model invocation is pretty simple: | ||
|
||
```python | ||
def predict(self, model_input: Dict): | ||
response = self._model(**model_input) | ||
return response | ||
``` | ||
truss init my_truss | ||
|
||
All we have to do is pass the model input to the model. | ||
|
||
But how do we make sure the model input is a valid format, and that the model output is usable? | ||
|
||
### Implement processing functions | ||
|
||
By default, pre- and post-processing functions are passthroughs. But if needed, you can implement these functions to make your model input and output match the specification of whatever app or API you're building. | ||
|
||
There are [more in-depth docs on processing functions here](../develop/processing.md), but here's sample code for the Stable Diffusion example, which needs a postprocessing function but not a pre-processing function: | ||
|
||
```python | ||
def postprocess(self, model_output: Dict) -> Dict: | ||
# Convert to base64 | ||
model_output["images"] = [pil_to_b64(img) for img in model_output["images"]] | ||
return asdict(model_output) | ||
``` | ||
|
||
This is the part you want to replace with your own code. Build a machine learning model and keep it in-memory. | ||
Eagle-eyed readers will note that `pil_to_b64()` is not a function that has been defined anywhere. How can we use it? | ||
|
||
### Call upon shared packages | ||
|
||
Here's that `pil_to_b64()` function from the last step: | ||
|
||
```python | ||
import xgboost as xgb | ||
from sklearn.datasets import make_classification | ||
from sklearn.model_selection import train_test_split | ||
|
||
def create_data(): | ||
X, y = make_classification(n_samples=100, | ||
n_informative=5, | ||
n_classes=2) | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) | ||
train = xgb.DMatrix(X_train, y_train) | ||
test = xgb.DMatrix(X_test, y_test) | ||
return train, test | ||
|
||
train, test = create_data() | ||
|
||
params = { | ||
"learning_rate": 0.01, | ||
"max_depth": 3 | ||
} | ||
|
||
# training, we set the early stopping rounds parameter | ||
model = xgb.train(params, | ||
train, evals=[(train, "train"), (test, "validation")], | ||
num_boost_round=100, early_stopping_rounds=20) | ||
import base64 | ||
from io import BytesIO | ||
|
||
from PIL import Image | ||
|
||
def pil_to_b64(pil_img): | ||
buffered = BytesIO() | ||
pil_img.save(buffered, format="PNG") | ||
img_str = base64.b64encode(buffered.getvalue()) | ||
return "data:image/png;base64," + str(img_str)[2:-1] | ||
``` | ||
|
||
Now, we'll serialize and save the model: | ||
You could just paste this into `models/model.py` and call it a day. But its better to factor out helper functions and utilities so that they can be re-used between multiple Trusses. | ||
|
||
Let's create a folder `shared` at the same level as our root `sd_truss` directory (don't create it inside the Truss directory). Then create a file `shared/base64_utils.py`. It should look like this: | ||
|
||
``` | ||
shared/ | ||
base64_utils.py | ||
sd_truss/ | ||
... | ||
``` | ||
|
||
Paste the code from above into `shared/base64_utils.py`. | ||
|
||
Let your Truss know where to look for external packages with the following lines in `config.yaml`: | ||
|
||
```yaml | ||
external_package_dirs: | ||
- ../shared/ | ||
``` | ||
Note that this is an array in yaml; your Truss can depend on multiple external directories for packages. | ||
Finally, at the top of `sd_truss/models/model.py`, add: | ||
|
||
```python | ||
import os | ||
model.save_model(os.path.join("my_truss", "data", "model", "xgboost.json")) | ||
from base64_utils import pil_to_b64 | ||
``` | ||
|
||
This will import your function from your external directory. | ||
|
||
For more details on bundled and shared packages, see [this demo repository](https://github.com/bolasim/truss-packages-example) and the [bundled packages docs](../develop/bundled-packages.md). | ||
|
||
### Set Python and system requirements | ||
|
||
Now, we switch our attention to `config.yaml`. You can use this file to customize a great deal about your packaged model — [here's a complete reference](../develop/configuration.md) — but right now we just care about setting our Python requirements up so the model can run. | ||
|
||
For that, find `requirements:` in the config file. In the Stable Diffusion 1.5 example, we set it to: | ||
|
||
```yaml | ||
requirements: | ||
- diffusers | ||
- transformers | ||
- accelerate | ||
- scipy | ||
- safetensors | ||
- xformers | ||
- triton | ||
``` | ||
|
||
These requirements work just like `requirements.txt` in a Python project, and you can pin versions with `package==1.2.3`. | ||
|
||
### Set hardware requirements | ||
|
||
Large models like Stable Diffusion require powerful hardware to run invocations. Set your packaged model's hardware requirements in `config.yaml`: | ||
|
||
```yaml | ||
resources: | ||
accelerator: A10G # Type of GPU required | ||
cpu: "8" # Number of vCPU cores required | ||
memory: 30Gi # Mibibytes (Mi) or Gibibytes (Gi) of RAM required | ||
use_gpu: true # If false, set accelerator: null | ||
``` | ||
|
||
Once your model is created, you'll likely need to develop it further, see the next section for everything you need to know about local development! | ||
You've successfully packaged a model! If you have the required hardware, you can [test it locally](../develop/localhost.md), or [deploy it to Baseten](https://docs.baseten.co/models/deploying-models/client#stage-2-deploying-a-draft) to get a draft model for rapid iteration in a production environment. |