Troubleshooting help? #27

kmatzen · 2016-06-07T18:38:10Z

I followed the instructions from the readme, but I can't get the tests to run. Is there any additional advice someone can give me?

# make CUDA_HOME=/usr/local/cuda test
Compiling src/libwrap.cu            > build/obj/libwrap.o
Compiling src/core.cu               > build/obj/core.o
Compiling src/all_gather.cu         > build/obj/all_gather.o
Compiling src/all_reduce.cu         > build/obj/all_reduce.o
Compiling src/broadcast.cu          > build/obj/broadcast.o
Compiling src/reduce.cu             > build/obj/reduce.o
Compiling src/reduce_scatter.cu     > build/obj/reduce_scatter.o
Linking   build/lib/libnccl.so.1.2.2
Grabbing  src/nccl.h                > build/include/nccl.h
Building  test/single/all_gather_test.cu > build/test/single/all_gather_test
Building  test/single/all_reduce_test.cu > build/test/single/all_reduce_test
Building  test/single/broadcast_test.cu > build/test/single/broadcast_test
Building  test/single/reduce_test.cu > build/test/single/reduce_test
Building  test/single/reduce_scatter_test.cu > build/test/single/reduce_scatter_test
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./build/lib
# ./build/test/single/all_reduce_test
Error: must specify at least data size in bytes!

Tests nccl AllReduce with user supplied arguments.
    Usage: all_reduce_test <data size in bytes> [number of GPUs] [GPU 0] [GPU 1] ...

# ./build/test/single/all_reduce_test 10000000
NCCL failure test/single/all_reduce_test.cu:259 'unhandled cuda error'

# nvidia-smi
Tue Jun  7 18:35:23 2016
+------------------------------------------------------+
| NVIDIA-SMI 361.42     Driver Version: 361.42         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   35C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   34C    P8    14W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:83:00.0     Off |                  N/A |
| 22%   34C    P8    16W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

sjeaugey · 2016-06-07T18:48:59Z

Hi Kevin,

Could you please re-launch the test with the environment variable NCCL_DEBUG set to "WARN" ? NCCL should display a clear error message before returning the error.

Thanks,
Sylvain

kmatzen · 2016-06-07T18:59:30Z

Here's the output.

# NCCL_DEBUG=WARN ./build/test/single/all_reduce_test 10000000
WARN src/libwrap.cu:217 cuInternalIpcGetMemHandle() failed: initialization error
WARN src/core.cu:303 rank 0 failed to open CUDA IPC handle
WARN src/core.cu:808 rank 0 failed to obtain rank info
NCCL failure test/single/all_reduce_test.cu:259 'unhandled cuda error'

sjeaugey · 2016-06-07T20:01:30Z

That's strange. It looks like it cannot find the cuIpcGetMemHandle from libcuda.so. But I think the test is linked against libcuda.so, so the libcuda should be there.

sjeaugey · 2016-06-07T21:41:35Z

Sorry, I read that wrong. It is not a symbol problem, just cudaIpcGetMemHandle returning an error.

sjeaugey · 2016-06-07T23:29:32Z

I didn't want to close this right away. Re-opening. @kmatzen can you check if that commit fixes your problem ?

Thanks.

svanschalkwyk · 2017-01-26T22:00:25Z

NCCL_DEBUG=WARN ./build/test/single/all_reduce_test 10000000
NCCL version 1.3.2 compiled with CUDA 8.0
Using devices
Rank 0 uses device 0 [0x04] GeForce GTX 960
Rank 1 uses device 1 [0x83] GeForce GTX 760
out-of-place in-place
bytes N type op time algbw busbw res time algbw busbw res
WARN src/all_reduce.cu:212 Cuda failure 'invalid device function'
NCCL failure test/single/all_reduce_test.cu:51 'unhandled cuda error'
CUDA 8, gcc-5, Linux Mint 18

sjeaugey · 2017-01-26T22:38:59Z

The message you get ("invalid device function") means that one of the GPU cannot execute the compiled code.

Indeed, the GTX760 has a compute capability of 3.0 (https://developer.nvidia.com/cuda-gpus), and the default NVCC_GENCODE in the Makefile only compiles for 3.5 and later.

Can you try to recompile with :

NVCC_GENCODE="-gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_52,code=sm_52"

(compute capability 5.2 is for the GTX960, 3.0 for the GTX760)

svanschalkwyk · 2017-01-26T23:01:22Z

Thank you!
I didn't notice the omission.
Have added it to the Makefile:

#
# Copyright (c) 2015-2016, NVIDIA CORPORATION. All rights reserved.
#
# See LICENCE.txt for license information
#

CUDA_HOME ?= /usr/local/cuda
PREFIX ?= /usr/local
VERBOSE ?= 0
KEEP ?= 0
DEBUG ?= 0
PROFAPI ?= 0
BUILDDIR ?= build
BUILDDIR := $(abspath $(BUILDDIR))

CUDA_LIB ?= $(CUDA_HOME)/lib64
CUDA_INC ?= $(CUDA_HOME)/include
NVCC ?= $(CUDA_HOME)/bin/nvcc

NVCC_GENCODE ?= **-gencode=arch=compute_30,code=sm_30 \**
		-gencode=arch=compute_35,code=sm_35 \
                -gencode=arch=compute_50,code=sm_50 \
                -gencode=arch=compute_52,code=sm_52 \
                -gencode=arch=compute_52,code=compute_52

CXXFLAGS   := -I$(CUDA_INC) -fPIC -fvisibility=hidden
NVCUFLAGS  := -ccbin $(CXX) $(NVCC_GENCODE) -lineinfo -std=c++11 -maxrregcount 96
# Use addprefix so that we can specify more than one path
LDFLAGS    := $(addprefix -L,${CUDA_LIB}) -lcudart -lrt

ifeq ($(DEBUG), 0)
NVCUFLAGS += -O3
CXXFLAGS  += -O3
else
NVCUFLAGS += -O0 -G
CXXFLAGS  += -O0 -g -ggdb3
endif

ifneq ($(VERBOSE), 0)
NVCUFLAGS += -Xptxas -v -Xcompiler -Wall,-Wextra
CXXFLAGS  += -Wall -Wextra
else
.SILENT:
endif

ifneq ($(KEEP), 0)
NVCUFLAGS += -keep
endif

ifneq ($(PROFAPI), 0)
CXXFLAGS += -DPROFAPI
endif

NCCL_MAJOR   := 1
NCCL_MINOR   := 3
NCCL_PATCH   := 2
CXXFLAGS  += -DNCCL_MAJOR=$(NCCL_MAJOR) -DNCCL_MINOR=$(NCCL_MINOR) -DNCCL_PATCH=$(NCCL_PATCH)

CUDA_VERSION ?= $(shell ls $(CUDA_LIB)/libcudart.so.* | head -1 | rev | cut -d "." -f -2 | rev)
CUDA_MAJOR = $(shell echo $(CUDA_VERSION) | cut -d "." -f 1)
CUDA_MINOR = $(shell echo $(CUDA_VERSION) | cut -d "." -f 2)
CXXFLAGS  += -DCUDA_MAJOR=$(CUDA_MAJOR) -DCUDA_MINOR=$(CUDA_MINOR)

.PHONY : all lib staticlib clean test mpitest install deb debian debclean forlib fortest forclean
.DEFAULT : all

INCEXPORTS  := nccl.h
LIBSRCFILES := libwrap.cu core.cu all_gather.cu all_reduce.cu broadcast.cu reduce.cu reduce_scatter.cu
LIBNAME     := libnccl.so
STATICLIBNAME := libnccl_static.a

INCDIR := $(BUILDDIR)/include
LIBDIR := $(BUILDDIR)/lib
OBJDIR := $(BUILDDIR)/obj

INCTARGETS := $(patsubst %, $(INCDIR)/%, $(INCEXPORTS))
LIBSONAME  := $(patsubst %,%.$(NCCL_MAJOR),$(LIBNAME))
LIBTARGET  := $(patsubst %,%.$(NCCL_MAJOR).$(NCCL_MINOR).$(NCCL_PATCH),$(LIBNAME))
STATICLIBTARGET := $(STATICLIBNAME)
LIBLINK    := $(patsubst lib%.so, -l%, $(LIBNAME))
LIBOBJ     := $(patsubst %.cu, $(OBJDIR)/%.o, $(filter %.cu, $(LIBSRCFILES)))
DEPFILES   := $(patsubst %.o, %.d, $(LIBOBJ)) $(patsubst %, %.d, $(TESTBINS)) $(patsubst %, %.d, $(MPITESTBINS))

all : lib staticlib

lib : $(INCTARGETS) $(LIBDIR)/$(LIBTARGET)

staticlib : $(INCTARGETS) $(LIBDIR)/$(STATICLIBTARGET)

-include $(DEPFILES)

$(LIBDIR)/$(LIBTARGET) : $(LIBOBJ)
	@printf "Linking   %-35s > %s\n" $(LIBTARGET) $@
	mkdir -p $(LIBDIR)
	$(CXX) $(CXXFLAGS) -shared -Wl,--no-as-needed -Wl,-soname,$(LIBSONAME) -o $@ $(LDFLAGS) $(LIBOBJ)
	ln -sf $(LIBSONAME) $(LIBDIR)/$(LIBNAME)
	ln -sf $(LIBTARGET) $(LIBDIR)/$(LIBSONAME)

$(LIBDIR)/$(STATICLIBTARGET) : $(LIBOBJ)
	@printf "Archiving %-35s > %s\n" $(STATICLIBTARGET) $@
	mkdir -p $(LIBDIR)
	ar cr $@ $(LIBOBJ)

$(INCDIR)/%.h : src/%.h
	@printf "Grabbing  %-35s > %s\n" $< $@
	mkdir -p $(INCDIR)
	cp -f $< $@

$(OBJDIR)/%.o : src/%.cu
	@printf "Compiling %-35s > %s\n" $< $@
	mkdir -p $(OBJDIR)
	$(NVCC) -c $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< -o $@
	@$(NVCC) -M $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< > $(@:%.o=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%.o=%.d.tmp) > $(@:%.o=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%.o=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%.o=%.d)
	@rm -f $(@:%.o=%.d.tmp)

clean :
	rm -rf $(BUILDDIR)

install : lib
	mkdir -p $(PREFIX)/lib
	mkdir -p $(PREFIX)/include
	cp -P -v $(BUILDDIR)/lib/* $(PREFIX)/lib/
	cp -v $(BUILDDIR)/include/* $(PREFIX)/include/


#### TESTS ####

TEST_ONLY ?= 0

# Tests depend on lib, except in TEST_ONLY mode.
ifeq ($(TEST_ONLY), 0)
TSTDEP = $(INCTARGETS) $(LIBDIR)/$(LIBTARGET)
endif

NCCL_LIB ?= $(LIBDIR)
NCCL_INC ?= $(INCDIR)

MPI_HOME ?= /usr
MPI_INC ?= $(MPI_HOME)/include
MPI_LIB ?= $(MPI_HOME)/lib
MPIFLAGS   := -I$(MPI_INC) -L$(MPI_LIB) -lmpi

TESTS       := all_gather_test     all_gather_scan \
               all_reduce_test     all_reduce_scan \
               broadcast_test      broadcast_scan \
               reduce_test         reduce_scan \
               reduce_scatter_test reduce_scatter_scan
MPITESTS    := mpi_test

TSTINC     := -I$(NCCL_INC) -Itest/include
TSTLIB     := -L$(NCCL_LIB) $(LIBLINK) $(LDFLAGS)
TSTDIR     := $(BUILDDIR)/test/single
MPITSTDIR  := $(BUILDDIR)/test/mpi
TESTBINS   := $(patsubst %, $(TSTDIR)/%, $(TESTS))
MPITESTBINS:= $(patsubst %, $(MPITSTDIR)/%, $(MPITESTS))

test : $(TESTBINS)

$(TSTDIR)/% : test/single/%.cu test/include/*.h $(TSTDEP)
	@printf "Building  %-35s > %s\n" $< $@
	mkdir -p $(TSTDIR)
	$(NVCC) $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" -o $@ $< $(TSTLIB) -lcuda -lcurand -lnvToolsExt
	@$(NVCC) -M $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< $(TSTLIB) -lcuda -lcurand -lnvToolsExt > $(@:%=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%=%.d.tmp) > $(@:%=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%=%.d)
	@rm -f $(@:%=%.d.tmp)

mpitest : $(MPITESTBINS)

$(MPITSTDIR)/% : test/mpi/%.cu $(TSTDEP)
	@printf "Building  %-35s > %s\n" $< $@
	mkdir -p $(MPITSTDIR)
	$(NVCC) $(MPIFLAGS) $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" -o $@ $< $(TSTLIB) -lcurand
	@$(NVCC) $(MPIFLAGS) -M $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< $(TSTLIB) -lcurand > $(@:%=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%=%.d.tmp) > $(@:%=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%=%.d)
	@rm -f $(@:%=%.d.tmp)

#### PACKAGING ####

DEBIANDIR  := $(BUILDDIR)/debian

DEBGEN_IN  := $(shell (cd debian ; ls *.in))
DEBGEN     := $(DEBGEN_IN:.in=)
DEBFILES   := compat copyright libnccl-dev.install libnccl-dev.manpages nccl.7 rules $(DEBGEN)
DEBTARGETS := $(patsubst %, $(DEBIANDIR)/%, $(DEBFILES))

DEB_REVISION   ?= 1
DEB_TIMESTAMP  := $(shell date -R)
DEB_ARCH       ?= amd64

debian : $(DEBTARGETS)

deb : lib debian
	@printf "Building Debian package\n"
	(cd $(BUILDDIR); debuild -eLD_LIBRARY_PATH -uc -us -d -b)
	mkdir -p $(BUILDDIR)/deb/
	mv $(BUILDDIR)/../libnccl*.deb $(BUILDDIR)/deb/

debclean :
	rm -Rf $(DEBIANDIR)

$(DEBIANDIR)/% : debian/%.in
	@printf "Generating %-35s > %s\n" $< $@
	sed -e "s/\$${nccl:Major}/$(NCCL_MAJOR)/g" \
	    -e "s/\$${nccl:Minor}/$(NCCL_MINOR)/g" \
	    -e "s/\$${nccl:Patch}/$(NCCL_PATCH)/g" \
	    -e "s/\$${cuda:Major}/$(CUDA_MAJOR)/g" \
	    -e "s/\$${cuda:Minor}/$(CUDA_MINOR)/g" \
	    -e "s/\$${deb:Revision}/$(DEB_REVISION)/g" \
	    -e "s/\$${deb:Timestamp}/$(DEB_TIMESTAMP)/g" \
	    -e "s/\$${deb:Arch}/$(DEB_ARCH)/g" \
	    $< > $@

$(DEBIANDIR)/% : debian/%
	@printf "Grabbing  %-35s > %s\n" $< $@
	mkdir -p $(DEBIANDIR)
	cp -f $< $@

#### FORTRAN BINDINGS ####

export NCCL_MAJOR NCCL_MINOR NCCL_PATCH CUDA_MAJOR CUDA_MINOR LIBLINK CUDA_LIB BUILDDIR

forlib : lib
	$(MAKE) -C fortran lib
fortest : forlib
	$(MAKE) -C fortran test
forclean :
	$(MAKE) -C fortran clean

sjeaugey · 2017-01-26T23:06:20Z

That's another way to do it, indeed. Actually, building for many architectures also increases the build times (for everyone) so you may want to only compile for the architectures you care about.

sjeaugey · 2018-09-26T17:50:28Z

Closing.

sjeaugey closed this as completed in d5e507f Jun 7, 2016

sjeaugey reopened this Jun 7, 2016

sjeaugey closed this as completed Sep 26, 2018

himanshucodz55 mentioned this issue Jul 25, 2022

RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Timeout waiting for key: default_pg/0/0 after 1800000 ms #708

Open

raninbowlalala mentioned this issue Jul 4, 2023

2 allreduce and a allgather hang in multi-node #899

Open

acphile mentioned this issue Sep 29, 2023

Question about ncclCommAbort stuck issue #1013

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting help? #27

Troubleshooting help? #27

kmatzen commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

kmatzen commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

sjeaugey commented Jun 7, 2016 •

edited

Loading

svanschalkwyk commented Jan 26, 2017 •

edited

Loading

sjeaugey commented Jan 26, 2017

svanschalkwyk commented Jan 26, 2017 •

edited by sjeaugey

Loading

sjeaugey commented Jan 26, 2017

sjeaugey commented Sep 26, 2018

Troubleshooting help? #27

Troubleshooting help? #27

Comments

kmatzen commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

kmatzen commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

sjeaugey commented Jun 7, 2016

sjeaugey commented Jun 7, 2016 • edited Loading

svanschalkwyk commented Jan 26, 2017 • edited Loading

sjeaugey commented Jan 26, 2017

svanschalkwyk commented Jan 26, 2017 • edited by sjeaugey Loading

sjeaugey commented Jan 26, 2017

sjeaugey commented Sep 26, 2018

sjeaugey commented Jun 7, 2016 •

edited

Loading

svanschalkwyk commented Jan 26, 2017 •

edited

Loading

svanschalkwyk commented Jan 26, 2017 •

edited by sjeaugey

Loading