Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/Build] CMakeLists: build all extensions' cmake targets at the same time #5034

Merged
merged 1 commit into from
Jun 1, 2024

Conversation

dtrifiro
Copy link
Contributor

Instead of building one target at a time, we can specify multiple --target flags simultaneously to cmake, this should speed up the build process a little.

@mgoin
Copy link
Member

mgoin commented May 24, 2024

This seems straightforward to me, thanks! Only concern is possibly increasing memory requirements to build because of more targets in flight. Any thoughts @bnellnm ?

Edit: make sure to run format.sh so CI can run

@youkaichao
Copy link
Member

I don't think this should be turned on by default. We tried very hard to not overwhelm user's machines:

vllm/setup.py

Lines 71 to 105 in 9197709

#
# Determine number of compilation jobs and optionally nvcc compile threads.
#
def compute_num_jobs(self):
# `num_jobs` is either the value of the MAX_JOBS environment variable
# (if defined) or the number of CPUs available.
num_jobs = envs.MAX_JOBS
if num_jobs is not None:
num_jobs = int(num_jobs)
logger.info("Using MAX_JOBS=%d as the number of jobs.", num_jobs)
else:
try:
# os.sched_getaffinity() isn't universally available, so fall
# back to os.cpu_count() if we get an error here.
num_jobs = len(os.sched_getaffinity(0))
except AttributeError:
num_jobs = os.cpu_count()
nvcc_threads = None
if _is_cuda() and get_nvcc_cuda_version() >= Version("11.2"):
# `nvcc_threads` is either the value of the NVCC_THREADS
# environment variable (if defined) or 1.
# when it is set, we reduce `num_jobs` to avoid
# overloading the system.
nvcc_threads = envs.NVCC_THREADS
if nvcc_threads is not None:
nvcc_threads = int(nvcc_threads)
logger.info(
"Using NVCC_THREADS=%d as the number of nvcc threads.",
nvcc_threads)
else:
nvcc_threads = 1
num_jobs = max(1, num_jobs // nvcc_threads)
return num_jobs, nvcc_threads

If we compile all the targets by default, it will violate this.

We can make this an opt-in feature, and users can turn it on if they know what they are doing.

@bnellnm
Copy link
Contributor

bnellnm commented May 25, 2024

This seems straightforward to me, thanks! Only concern is possibly increasing memory requirements to build because of more targets in flight. Any thoughts @bnellnm ?

Edit: make sure to run format.sh so CI can run

I'm not too worried about overloading machines. Even with multiple targets the maximum number of jobs should be honored. The only problem I see with this is that each extension/target could conceivably have a different configuration. Currently, this is not the case but this change could cause problems if targets ever had different configs.

@dtrifiro
Copy link
Contributor Author

dtrifiro commented May 27, 2024

@youkaichao

I'm not sure I follow. This should use max_jobs processes with the given number of nvcc threads, the only difference is that cmake --build will be only invoked once and will be aware of all of the targets that need to be built and not have "idle times" between finishing building all of the pieces of one extension and starting compiling the next one.

@mgoin

Regarding memory usage: I noticed a very slight increase in peak memory usage, but overall I think it's negligible and it's still limited by setting proper values for nvcc_threads and max_jobs. Empirically, I saw at most ~3.5GB memory for each nvcc threads, so, for something like MAX_JOBS=8 ( => -j 4) and NVCC_THREADS=2, I saw a peak of ~21GB of memory:

Here is an example of what a build looks like with these settings:
image

Without this patch, CPU usage (and memory as well) raises, stays at max for a while, then dwindles down until it finishes compiling one extension and then ramps up again for the next extension and so on (looks like the tail of the above cpu usage graph), resulting in some time intervals in which not all of the cores are being used. If I rember correctly, this patch improves build times by up to 20 minutes in some cases

@dtrifiro dtrifiro force-pushed the setup-build-all-targets branch from 05f5e78 to 7249850 Compare May 27, 2024 09:40
@dtrifiro
Copy link
Contributor Author

Any updates on this?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dtrifiro for the clear justification on why this patch improves CPU utilization. It is clear that if we run into peak memory consumption issues, we should continue to tune MAX_JOBS and NVCC_THREADS independently of compiling all targets at once.

@mgoin mgoin merged commit a360ff8 into vllm-project:main Jun 1, 2024
63 checks passed
blinkbear pushed a commit to blinkbear/vllm that referenced this pull request Jun 3, 2024
@dtrifiro dtrifiro deleted the setup-build-all-targets branch June 3, 2024 08:03
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants