Skip to content

[RFC]: Kernel Library Restructure / Packaging Split (addressing long build times) #17419

@seemethere

Description

@seemethere

Motivation.

Reasonable Build Time?

Motivation

vLLM local builds take too long.

On CI the wall clock time for a build (with no cache) exceeds 5 hours. For the repository it’s noted that most of the commits to the vllm directory did not modify the csrc directory, effectively making them python only changes and not requiring extra build time (95% within the last year).

It’s also acknowledged in the install from source docs about how long the build time is with most people being pointed towards python only builds as a means to get a reasonable build time to do most of their work. (link)

Image

As well, I think it’s worth it to consider how we structure the code to make it more clear over what is kernel specific vs. what is engine specific, which could benefit the project by clearly identifying when we would need to run things like performance testing vs. unit testing (i.e. kernel vs. engine).

Proposed Change.

The vLLM package is currently made up of two pretty distinct paths:

  • csrc: Kernels / PyTorch Custom Operators
    • To also include vllm-flash-attn source builds
  • vllm: Engine code that sometimes utilizes things from csrc

My proposal is split up csrc / vllm packages into two separate python packages with matching directory structures (names are flexible here):

  • vllm: The current package that users install with a dependency on vllm-ops
  • vllm-kernels: Made up of vLLM Kernels + Custom Ops + dependencies like vllm-flash-attn

Potential directory structure:

vllm/
├─ vllm-engine/            <-- Current Package
│  ├─ vllm/
│  ├─ pyproject.toml
├─ vllm-kernels/           <-- New package 
│  ├─ vllm_kernels/        <-- Python frontend
│  ├─ csrc/                <-- Kernel code
│  ├─ pyproject.toml

The changes will follow into two steps:

  • Step 1: refactor as two packages,
  • Step 2: polish the kernel APIs, and separate the libraries based on the backend type.

Along with this change I recommend that we restructure kernels and the main engine code into clear separate directories for better code organization over which directories build which packages.

Feedback Period.

No formal feedback period! I'd like to start this work ASAP while my time isn't allocated to another project so ideally within the next month or so!

CC List.

cc @simon-mo, @tlrmchlsmth, @youkaichao, @comaniac

Any Other Things.

Other potential benefits

Package size could be spread across 2 packages

We all understand how much it sucks dealing with space limits on PyPI, by effectively splitting our package size across two different packages on PyPI we can increase the number of releases we can do in the future.

Decoupling release cycles

*Potentially *splitting out the release cadence of the kernels and the engine, rolling out kernel updates without having to do a full engine upgrade and vice versa. This can make it so that you won’t even need to do an update to the engine and still roll out things like performance upgrades.

Backends could be more swappable

Another potential idea for this is that we can make the vllm-kernels package swappable between different accelerator types (cpu, rocm, cuda, etc.) and make it so that we can actually publish accelerator backends (like rocm) onto pypi and have everything installable from there. This also opens up the potential for vendors to manage their own backend packages themselves, freeing up vllm core maintainers to focus on just the core set of backends.

Some caveats

Note

I’m not necessarily proposing that we have other people depend on vllm-kernels but just that we split the packaging up for easier organization / caching for both our local developers as well as for CI

Ensuring compatibility between the two packages could be a challenge

Having two separate packages does open us up to issues where users could install incompatible versions of the packages together.

Example:

Imagine a user has vllm-kernels==1.0.0 while also having vllm==2.0.0 installed, these could be wildly different and should be incompatible.

Potential solution(s):

Hard-lock dependencies

We could also hard lock the engine package to the kernels as well by just explicitly making each version of vllm depend on the requisite vllm-kernels package (i.e. vllm==2.0.0 depend on vllm-kernels==2.0.0)

Soft-lock dependencies

We could just make the vLLM package depend on a specific minor version of the vllm-kernels package. (i.e. have vllm==2.0.0 depend on the vllm-kernels>=2.0.0,<=2.1.0)

Open Discussion Points

  • Directory / project structure: How should we represent the two packages within the repository?
    • Does vllm-kernels potentially move to a separate repository?
    • Do we feel interested in maintaining a top level repository structure to represent the two packages (i.e. having top-level vllm-engine, vllm-kernels directories?)?
  • Compatibility: Do we always want to release these packages together? Do we think that we should de-couple them for some of the benefits listed above?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCkeep-openPrevents stale label being applied

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions