[RFC]: Kernel Library Restructure / Packaging Split (addressing long build times)

### Motivation.

![Reasonable Build Time?](https://github.com/user-attachments/assets/5631421a-9a02-4fd8-ac23-c79f1bca1c07)

## Motivation

*vLLM local builds take too long.*

On CI the wall clock time for a build (with no cache) exceeds [5 hours](https://buildkite.com/vllm/ci/builds/18094/steps?jid=01965b50-76d2-4a70-9d55-e517017e0a48). For the repository it’s noted that most of the commits to the vllm directory *did* *not* modify the csrc directory, effectively making them python only changes and not requiring extra build time (95% within the last year).

It’s also acknowledged in the install from source docs about how long the build time is with most people being pointed towards python only builds as a means to get a reasonable build time to do most of their work. ([link](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#build-wheel-from-source))

<img width="813" alt="Image" src="https://github.com/user-attachments/assets/374314fb-93a5-4bc2-ab34-33c2b8dec351" />

As well, I think it’s worth it to consider how we structure the code to make it more clear over what is kernel specific vs. what is engine specific, which could benefit the project by clearly identifying when we would need to run things like performance testing vs. unit testing (i.e. kernel vs. engine).

### Proposed Change.

The vLLM package is currently made up of two pretty distinct paths:

* *csrc*: Kernels / PyTorch Custom Operators
    * To also include vllm-flash-attn source builds
* *vllm*: Engine code that sometimes utilizes things from csrc

My proposal is split up csrc / vllm packages into two separate python packages with matching directory structures (names are flexible here):

* *vllm*: The current package that users install with a dependency on vllm-ops
* *vllm-kernels*: Made up of vLLM Kernels + Custom Ops + dependencies like vllm-flash-attn

Potential directory structure:
```
vllm/
├─ vllm-engine/            <-- Current Package
│  ├─ vllm/
│  ├─ pyproject.toml
├─ vllm-kernels/           <-- New package 
│  ├─ vllm_kernels/        <-- Python frontend
│  ├─ csrc/                <-- Kernel code
│  ├─ pyproject.toml
```

The changes will follow into two steps:
* Step 1: refactor as two packages,
* Step 2: polish the kernel APIs, and separate the libraries based on the backend type.

Along with this change I recommend that we restructure kernels and the main engine code into clear separate directories for better code organization over which directories build which packages.

### Feedback Period.

No formal feedback period! I'd like to start this work ASAP while my time isn't allocated to another project so ideally within the next month or so!

### CC List.

cc @simon-mo, @tlrmchlsmth, @youkaichao, @comaniac  

### Any Other Things.

### Other potential benefits


#### Package size could be spread across 2 packages

We all understand how much it sucks dealing with space limits on PyPI, by effectively splitting our package size across two different packages on PyPI we can increase the number of releases we can do in the future.


#### Decoupling release cycles

*Potentially *splitting out the release cadence of the kernels and the engine, rolling out kernel updates without having to do a full engine upgrade and vice versa. This can make it so that you won’t even need to do an update to the engine and still roll out things like performance upgrades.


#### Backends could be more swappable

Another potential idea for this is that we can make the *vllm-kernels* package swappable between different accelerator types (cpu, rocm, cuda, etc.) and make it so that we can actually publish accelerator backends (like rocm) onto pypi and have everything installable from there. This also opens up the potential for vendors to manage their own backend packages themselves, freeing up vllm core maintainers to focus on just the core set of backends.

#### Some caveats

> [!NOTE]
> I’m not necessarily proposing that we have other people depend on vllm-kernels but just that we split the packaging up for easier organization / caching for both our local developers as well as for CI


##### Ensuring compatibility between the two packages could be a challenge

Having two separate packages does open us up to issues where users could install incompatible versions of the packages together.

###### Example:

Imagine a user has *vllm-kernels==1.0.0* while also having *vllm==2.0.0* installed, these could be wildly different and should be incompatible.

###### Potential solution(s):

**Hard-lock dependencies**

We could also hard lock the engine package to the kernels as well by just explicitly making each version of vllm depend on the requisite vllm-kernels package (i.e. vllm==2.0.0 depend on vllm-kernels==2.0.0)

**Soft-lock dependencies**

We could just make the vLLM package depend on a specific minor version of the vllm-kernels package. (i.e. have *vllm==2.0.0* depend on the *vllm-kernels>=2.0.0,&lt;=2.1.0*)

### Open Discussion Points
* **Directory / project structure**: How should we represent the two packages within the repository?
    * Does vllm-kernels potentially move to a separate repository? 
    * Do we feel interested in maintaining a top level repository structure to represent the two packages (i.e. having top-level vllm-engine, vllm-kernels directories?)?
* **Compatibility**: Do we always want to release these packages together? Do we think that we should de-couple them for some of the benefits listed above?

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Kernel Library Restructure / Packaging Split (addressing long build times) #17419

Motivation.

Motivation

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Other potential benefits

Package size could be spread across 2 packages

Decoupling release cycles

Backends could be more swappable

Some caveats

Ensuring compatibility between the two packages could be a challenge

Example:

Potential solution(s):

Open Discussion Points

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Kernel Library Restructure / Packaging Split (addressing long build times) #17419

Description

Motivation.

Motivation

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Other potential benefits

Package size could be spread across 2 packages

Decoupling release cycles

Backends could be more swappable

Some caveats

Ensuring compatibility between the two packages could be a challenge

Example:

Potential solution(s):

Open Discussion Points

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions