Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Triton backend support #537

Merged
merged 21 commits into from
Aug 24, 2023
Merged

Adding Triton backend support #537

merged 21 commits into from
Aug 24, 2023

Conversation

aspctu
Copy link
Collaborator

@aspctu aspctu commented Aug 10, 2023

Overview

This PR adds support for Triton as a backend for Truss. Specifically, this PR contains logic for

  1. Building Docker images that run user code within Triton
  2. Generating the appropriate configs for Triton based on a user's config.yaml and model.py
  3. Handling the conversion between Triton types and Python types

Logic around testing this flow will be in a follow-up PR. It's a significant testing suite and requires running tests within the Triton docker container.

Quickstart

Quickstart repo here

  1. git clone https://github.com/aspctu/bert-triton-truss
  2. truss image build-context ./bert-truss-context ./bert-truss
  3. cd ./bert-truss-context
  4. docker build ./
  5. docker run --gpus=all -p8080:8080 -p8000:8000 -it (image id)

Follow the README.md in the repo above to invoke the model.

Introduction

Triton is a high-performance model serving backend developed by NVidia. For most models (outside of LLMs), it's advantageous to use Triton as the backend server. This is due to various server features that are attractive to maximizing GPU utilization and memory.

This PR introduces a simplified developer experience to enabling users to tap into some of this functionality within Truss. It's worth nothing that there is a lot of functionality in Triton that is not supported here (such as decoupled mode or ensemble models).

To enable Triton, a user needs to do a couple things:

  1. Update their config.yaml to contain the following information (automatically done if the truss is created via truss init)
build:
  model_server: TRITON
  1. Define 2 Pydantic classes in their model.py that correspond to the Input and Output of their model (example below)
from pydantic import BaseModel, conlist

class Input(BaseModel):
    text: str

class Output(BaseModel):
    text: str
    embedding: conlist(float, min_length=768, max_length=768)

...
  1. Update their predict function to accept a List[Input] and produce a List[Output]
def predict(inputs: List[Input]) -> List[Output]:
...

TODOs

  • Parametrize GPU / CPU deployment in config.pbtxt
  • Look into truss image build failing to do anything

Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just a few comments/questions. Lemme know when it's ready for review.

truss/templates/triton/config.pbtxt.jinja Outdated Show resolved Hide resolved
truss/templates/triton/config.pbtxt.jinja Outdated Show resolved Hide resolved
truss/templates/triton/model/1/model_wrapper.py Outdated Show resolved Hide resolved
truss/templates/triton/model/1/model_wrapper.py Outdated Show resolved Hide resolved
truss/templates/triton/proxy.conf Outdated Show resolved Hide resolved
truss/templates/triton/proxy.conf Outdated Show resolved Hide resolved
@aspctu aspctu marked this pull request as ready for review August 16, 2023 06:30
@aspctu aspctu changed the title (WIP) Triton Adding Triton backend support Aug 16, 2023
Copy link
Collaborator

@squidarth squidarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass. Overall, I like the structure that we went with here!

Nits aside, main comments here are around error-handling (and my own questions about what kind of assumptions are fair to make about the input)

truss/templates/triton/model/triton_model_wrapper.py Outdated Show resolved Hide resolved
truss/templates/triton/model/utils/pydantic.py Outdated Show resolved Hide resolved
truss/templates/triton/model/transform.py Outdated Show resolved Hide resolved
truss/templates/triton/model/transform.py Show resolved Hide resolved
truss/templates/triton/model/triton_model_wrapper.py Outdated Show resolved Hide resolved
truss/templates/triton/model/utils/pydantic.py Outdated Show resolved Hide resolved
truss/templates/triton/model/utils/triton.py Outdated Show resolved Hide resolved
truss/templates/triton/model/transform.py Outdated Show resolved Hide resolved
truss/templates/triton/model/transform.py Show resolved Hide resolved
truss/templates/triton/model/utils/triton.py Outdated Show resolved Hide resolved
truss/templates/triton/root/generate_config.py Outdated Show resolved Hide resolved
truss/templates/triton/root/generate_config.py Outdated Show resolved Hide resolved
@aspctu aspctu requested a review from squidarth August 18, 2023 16:14
@aspctu aspctu requested review from joostinyi and bolasim August 18, 2023 16:14
Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge this and use it then iterate on smaller PRs

@aspctu aspctu enabled auto-merge (squash) August 24, 2023 15:52
@aspctu aspctu disabled auto-merge August 24, 2023 15:52
@aspctu aspctu merged commit 8b9d302 into main Aug 24, 2023
@aspctu aspctu deleted the abuqader/adding-triton-support branch August 24, 2023 16:22
@amiruci
Copy link
Member

amiruci commented Aug 24, 2023

🤩 🤩 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants