Is there an issue with Conda environments? #720

Denizdius · 2024-07-30T17:26:59Z

Denizdius
Jul 30, 2024

I am struggled to compile and run llm.c with conda environments :
I am using this environment.yml file:
name: my-env2
channels:
- conda-forge
dependencies:
- cuda-libraries # this is the cuda metapackage
- cudnn # this is specifically for cudnn
- cuda-nvcc # ensures that a compatible nvidia C compiler is available!
# This may be sufficient, but it's probably safer to specify the CUDA built
# variant explicitly to make the conda solver's job easier.
#- jaxlib
- jaxlib==cuda*
- cuda-version=12.4
- jax
- python=3.10
and I am getting erros :
(my-env2) pars@pars-Precision-5540:~/Documents/deniz/llm.c$ make train_gpt2cu

→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ NCCL found, OK to train with multiple GPUs
✓ MPI enabled
✓ nvcc found, including GPU/CUDA support

/home/pars/miniconda3/envs/my-env2/bin/nvcc -O3 -t=0 --use_fast_math -std=c++17 --generate-code arch=compute_75,code=[compute_75,sm_75] -DMULTI_GPU -DUSE_MPI -DENABLE_BF16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include/ -lnccl -lmpi -o train_gpt2cu
In file included from train_gpt2.cu:37:
llmc/cuda_common.h:13:10: fatal error: nvtx3/nvToolsExt.h: No such file or directory
13 | #include <nvtx3/nvToolsExt.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from train_gpt2.cu:37:
llmc/cuda_common.h:13:10: fatal error: nvtx3/nvToolsExt.h: No such file or directory
13 | #include <nvtx3/nvToolsExt.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:268: train_gpt2cu] Error 255
How can I solve this issue or do you guys have any proper environment.yml file for llm.c .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an issue with Conda environments? #720

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Is there an issue with Conda environments? #720

Denizdius Jul 30, 2024

→ cuDNN is manually disabled by default, run make with USE_CUDNN=1 to try to enable ✓ OpenMP found ✓ NCCL found, OK to train with multiple GPUs ✓ MPI enabled ✓ nvcc found, including GPU/CUDA support

Replies: 0 comments

Denizdius
Jul 30, 2024

→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ NCCL found, OK to train with multiple GPUs
✓ MPI enabled
✓ nvcc found, including GPU/CUDA support