You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm performing matrix multiplication using multiple workers on a GPU, using Distributed. I limit the memory usage of CUDA using the env variables per the documentation, but oddly, these variables only seem to be effective when using large matrices of 1024x1024. The code below seems to ignore the memory restrictions, uses all available memory, and eventually OOMs due to what I believe is a race condition. Changing the 128 to 1024 in the following code results in each process restricting itself to just 2x the memory limit (around 10%), which prevents an OOM on my machine.
To reproduce
using Distributed
env = [
"JULIA_CUDA_HARD_MEMORY_LIMIT"=>"5%",
"JULIA_CUDA_MEMORY_POOL"=>"none"
]
n_workers =6addprocs(n_workers, env=env)
@everywherebeginusing CUDA
functionmatrix_multiply_on_gpu(worker_id)
A = CUDA.rand(Float32, 128, 128)
B = CUDA.rand(Float32, 128, 128)
C = A * B
returnsum(C)
endendfor i in1:100_000pmap(matrix_multiply_on_gpu, 1:n_workers)
end
Manifest.toml
Using an environment with only CUDA.jl#master installed, will update with a Manifest.toml if needed
Expected behavior
I expect each process to limit itself to 5% of the CPU memory, or at least, some maximum amount.
Version info
Details on Julia:
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × AMD Ryzen Threadripper PRO 5975WX 32-Cores
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
LD_LIBRARY_PATH =
This is for a research project, where I want to distribute a workload across many processes on multiple machines, each process utilizing a small amount of one of the GPUs.
The text was updated successfully, but these errors were encountered:
Describe the bug
I'm performing matrix multiplication using multiple workers on a GPU, using Distributed. I limit the memory usage of CUDA using the env variables per the documentation, but oddly, these variables only seem to be effective when using large matrices of 1024x1024. The code below seems to ignore the memory restrictions, uses all available memory, and eventually OOMs due to what I believe is a race condition. Changing the 128 to 1024 in the following code results in each process restricting itself to just 2x the memory limit (around 10%), which prevents an OOM on my machine.
To reproduce
Manifest.toml
Expected behavior
I expect each process to limit itself to 5% of the CPU memory, or at least, some maximum amount.
Version info
Details on Julia:
Details on CUDA:
Additional context
This is for a research project, where I want to distribute a workload across many processes on multiple machines, each process utilizing a small amount of one of the GPUs.
The text was updated successfully, but these errors were encountered: