Call for Help: Proper Build System (CMake, Bazel, etc). #2654

simon-mo · 2024-01-29T20:50:09Z

Currently vLLM's compilation tool uses PyTorch's extension builders, which calls Ninja under the hood. This works okay but have the following issues:

Only supports NVIDIA and AMD GPUs.
Slow sequential builds. This is amplified by adding quantization kernels and LoRA kernels.
No caching and incremental builds.

We would liked to ask for community's help on recommending a technology, prototype, and implement it. Ideally something like CMake or Bazel could work but it requires some careful thinking.

The requirements:

Must support multiple hardware architecture (NVIDIA, AMD, Intel, etc).
Must support incremental build, which also implies caching.
Must support parallelizable build.
Good to have editor support (by generating compilation database).
Ideally it would not OOM like current setup. Currently due to the rigid structure, we have to carefully set MAX_JOBS and NVCC_THREADS to get around compiler goes out of memory. I think this is because nvcc spawn threads for each SM architecture we are compiling to.
vaguely, "future proof".

Currently, the "build system" is all in here https://github.com/vllm-project/vllm/blob/main/setup.py

The text was updated successfully, but these errors were encountered:

lroberts7 · 2024-01-29T22:54:41Z

@rgommers would meson-python support this?

It checks most of the boxes but not sure about the multiple hardware for accelerators. If not hoping you might have some experience and opinions you'd be willing to share.

rgommers · 2024-01-30T10:33:03Z

The question here isn't very clear to me, I'm missing context I guess. Reading all the requirements, it should like you need a regular build system (CMake or Meson are the most commonly used and best general-purpose options). However, if you're already using the PyTorch extension builder, it sounds like that is something you do on the fly (maybe exposed to end users?) - this is a very different use case.

simon-mo · 2024-01-30T19:21:29Z

Ah good to clarify here. We are really just looking for a regular build system to replace current usage of Torch extension builders.

robertgshaw2-neuralmagic · 2024-02-01T15:16:19Z

@simon-mo, the team from Neural Magic is going to work on this

cc @tlrmchlsmth @bnellnm

bnellnm · 2024-02-19T20:23:36Z

Hi all, I wanted to give an update on this project. So far, I've got a CUDA build working (see PR #2830). The PR has a detailed description of the cmake system. I'm still working on the AMD/ROCm build which is a little trickier because of the "hipify" preprocessor that pytorch uses on the CUDA sources.

WoosukKwon added the help wanted Extra attention is needed label Jan 29, 2024

simon-mo changed the title ~~Call for Help: Compilation Build Tool~~ Call for Help: Proper Build System (CMake, Bazel, etc). Jan 30, 2024

simon-mo mentioned this issue Feb 1, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

bnellnm mentioned this issue Feb 11, 2024

Cmake based build system #2830

Merged

simon-mo closed this as completed in #2830 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

simon-mo commented Jan 29, 2024 •

edited

Loading

lroberts7 commented Jan 29, 2024

rgommers commented Jan 30, 2024

simon-mo commented Jan 30, 2024

robertgshaw2-neuralmagic commented Feb 1, 2024

bnellnm commented Feb 19, 2024 •

edited

Loading

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Comments

simon-mo commented Jan 29, 2024 • edited Loading

lroberts7 commented Jan 29, 2024

rgommers commented Jan 30, 2024

simon-mo commented Jan 30, 2024

robertgshaw2-neuralmagic commented Feb 1, 2024

bnellnm commented Feb 19, 2024 • edited Loading

simon-mo commented Jan 29, 2024 •

edited

Loading

bnellnm commented Feb 19, 2024 •

edited

Loading