Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting help? #27

Closed
kmatzen opened this issue Jun 7, 2016 · 10 comments
Closed

Troubleshooting help? #27

kmatzen opened this issue Jun 7, 2016 · 10 comments

Comments

@kmatzen
Copy link

kmatzen commented Jun 7, 2016

I followed the instructions from the readme, but I can't get the tests to run. Is there any additional advice someone can give me?

# make CUDA_HOME=/usr/local/cuda test
Compiling src/libwrap.cu            > build/obj/libwrap.o
Compiling src/core.cu               > build/obj/core.o
Compiling src/all_gather.cu         > build/obj/all_gather.o
Compiling src/all_reduce.cu         > build/obj/all_reduce.o
Compiling src/broadcast.cu          > build/obj/broadcast.o
Compiling src/reduce.cu             > build/obj/reduce.o
Compiling src/reduce_scatter.cu     > build/obj/reduce_scatter.o
Linking   build/lib/libnccl.so.1.2.2
Grabbing  src/nccl.h                > build/include/nccl.h
Building  test/single/all_gather_test.cu > build/test/single/all_gather_test
Building  test/single/all_reduce_test.cu > build/test/single/all_reduce_test
Building  test/single/broadcast_test.cu > build/test/single/broadcast_test
Building  test/single/reduce_test.cu > build/test/single/reduce_test
Building  test/single/reduce_scatter_test.cu > build/test/single/reduce_scatter_test
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./build/lib
# ./build/test/single/all_reduce_test
Error: must specify at least data size in bytes!

Tests nccl AllReduce with user supplied arguments.
    Usage: all_reduce_test <data size in bytes> [number of GPUs] [GPU 0] [GPU 1] ...

# ./build/test/single/all_reduce_test 10000000
NCCL failure test/single/all_reduce_test.cu:259 'unhandled cuda error'
# nvidia-smi
Tue Jun  7 18:35:23 2016
+------------------------------------------------------+
| NVIDIA-SMI 361.42     Driver Version: 361.42         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   35C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   34C    P8    14W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:83:00.0     Off |                  N/A |
| 22%   34C    P8    16W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
@sjeaugey
Copy link
Member

sjeaugey commented Jun 7, 2016

Hi Kevin,

Could you please re-launch the test with the environment variable NCCL_DEBUG set to "WARN" ? NCCL should display a clear error message before returning the error.

Thanks,
Sylvain

@kmatzen
Copy link
Author

kmatzen commented Jun 7, 2016

Here's the output.

# NCCL_DEBUG=WARN ./build/test/single/all_reduce_test 10000000
WARN src/libwrap.cu:217 cuInternalIpcGetMemHandle() failed: initialization error
WARN src/core.cu:303 rank 0 failed to open CUDA IPC handle
WARN src/core.cu:808 rank 0 failed to obtain rank info
NCCL failure test/single/all_reduce_test.cu:259 'unhandled cuda error'

@sjeaugey
Copy link
Member

sjeaugey commented Jun 7, 2016

That's strange. It looks like it cannot find the cuIpcGetMemHandle from libcuda.so. But I think the test is linked against libcuda.so, so the libcuda should be there.

@sjeaugey
Copy link
Member

sjeaugey commented Jun 7, 2016

Sorry, I read that wrong. It is not a symbol problem, just cudaIpcGetMemHandle returning an error.

@sjeaugey
Copy link
Member

sjeaugey commented Jun 7, 2016

I didn't want to close this right away. Re-opening. @kmatzen can you check if that commit fixes your problem ?

Thanks.

@sjeaugey sjeaugey reopened this Jun 7, 2016
@svanschalkwyk
Copy link

svanschalkwyk commented Jan 26, 2017

NCCL_DEBUG=WARN ./build/test/single/all_reduce_test 10000000
NCCL version 1.3.2 compiled with CUDA 8.0
Using devices
Rank 0 uses device 0 [0x04] GeForce GTX 960
Rank 1 uses device 1 [0x83] GeForce GTX 760
out-of-place in-place
bytes N type op time algbw busbw res time algbw busbw res
WARN src/all_reduce.cu:212 Cuda failure 'invalid device function'
NCCL failure test/single/all_reduce_test.cu:51 'unhandled cuda error'
CUDA 8, gcc-5, Linux Mint 18

@sjeaugey
Copy link
Member

The message you get ("invalid device function") means that one of the GPU cannot execute the compiled code.

Indeed, the GTX760 has a compute capability of 3.0 (https://developer.nvidia.com/cuda-gpus), and the default NVCC_GENCODE in the Makefile only compiles for 3.5 and later.

Can you try to recompile with :

NVCC_GENCODE="-gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_52,code=sm_52"

(compute capability 5.2 is for the GTX960, 3.0 for the GTX760)

@svanschalkwyk
Copy link

svanschalkwyk commented Jan 26, 2017

Thank you!
I didn't notice the omission.
Have added it to the Makefile:

#
# Copyright (c) 2015-2016, NVIDIA CORPORATION. All rights reserved.
#
# See LICENCE.txt for license information
#

CUDA_HOME ?= /usr/local/cuda
PREFIX ?= /usr/local
VERBOSE ?= 0
KEEP ?= 0
DEBUG ?= 0
PROFAPI ?= 0
BUILDDIR ?= build
BUILDDIR := $(abspath $(BUILDDIR))

CUDA_LIB ?= $(CUDA_HOME)/lib64
CUDA_INC ?= $(CUDA_HOME)/include
NVCC ?= $(CUDA_HOME)/bin/nvcc

NVCC_GENCODE ?= **-gencode=arch=compute_30,code=sm_30 \**
		-gencode=arch=compute_35,code=sm_35 \
                -gencode=arch=compute_50,code=sm_50 \
                -gencode=arch=compute_52,code=sm_52 \
                -gencode=arch=compute_52,code=compute_52

CXXFLAGS   := -I$(CUDA_INC) -fPIC -fvisibility=hidden
NVCUFLAGS  := -ccbin $(CXX) $(NVCC_GENCODE) -lineinfo -std=c++11 -maxrregcount 96
# Use addprefix so that we can specify more than one path
LDFLAGS    := $(addprefix -L,${CUDA_LIB}) -lcudart -lrt

ifeq ($(DEBUG), 0)
NVCUFLAGS += -O3
CXXFLAGS  += -O3
else
NVCUFLAGS += -O0 -G
CXXFLAGS  += -O0 -g -ggdb3
endif

ifneq ($(VERBOSE), 0)
NVCUFLAGS += -Xptxas -v -Xcompiler -Wall,-Wextra
CXXFLAGS  += -Wall -Wextra
else
.SILENT:
endif

ifneq ($(KEEP), 0)
NVCUFLAGS += -keep
endif

ifneq ($(PROFAPI), 0)
CXXFLAGS += -DPROFAPI
endif

NCCL_MAJOR   := 1
NCCL_MINOR   := 3
NCCL_PATCH   := 2
CXXFLAGS  += -DNCCL_MAJOR=$(NCCL_MAJOR) -DNCCL_MINOR=$(NCCL_MINOR) -DNCCL_PATCH=$(NCCL_PATCH)

CUDA_VERSION ?= $(shell ls $(CUDA_LIB)/libcudart.so.* | head -1 | rev | cut -d "." -f -2 | rev)
CUDA_MAJOR = $(shell echo $(CUDA_VERSION) | cut -d "." -f 1)
CUDA_MINOR = $(shell echo $(CUDA_VERSION) | cut -d "." -f 2)
CXXFLAGS  += -DCUDA_MAJOR=$(CUDA_MAJOR) -DCUDA_MINOR=$(CUDA_MINOR)

.PHONY : all lib staticlib clean test mpitest install deb debian debclean forlib fortest forclean
.DEFAULT : all

INCEXPORTS  := nccl.h
LIBSRCFILES := libwrap.cu core.cu all_gather.cu all_reduce.cu broadcast.cu reduce.cu reduce_scatter.cu
LIBNAME     := libnccl.so
STATICLIBNAME := libnccl_static.a

INCDIR := $(BUILDDIR)/include
LIBDIR := $(BUILDDIR)/lib
OBJDIR := $(BUILDDIR)/obj

INCTARGETS := $(patsubst %, $(INCDIR)/%, $(INCEXPORTS))
LIBSONAME  := $(patsubst %,%.$(NCCL_MAJOR),$(LIBNAME))
LIBTARGET  := $(patsubst %,%.$(NCCL_MAJOR).$(NCCL_MINOR).$(NCCL_PATCH),$(LIBNAME))
STATICLIBTARGET := $(STATICLIBNAME)
LIBLINK    := $(patsubst lib%.so, -l%, $(LIBNAME))
LIBOBJ     := $(patsubst %.cu, $(OBJDIR)/%.o, $(filter %.cu, $(LIBSRCFILES)))
DEPFILES   := $(patsubst %.o, %.d, $(LIBOBJ)) $(patsubst %, %.d, $(TESTBINS)) $(patsubst %, %.d, $(MPITESTBINS))

all : lib staticlib

lib : $(INCTARGETS) $(LIBDIR)/$(LIBTARGET)

staticlib : $(INCTARGETS) $(LIBDIR)/$(STATICLIBTARGET)

-include $(DEPFILES)

$(LIBDIR)/$(LIBTARGET) : $(LIBOBJ)
	@printf "Linking   %-35s > %s\n" $(LIBTARGET) $@
	mkdir -p $(LIBDIR)
	$(CXX) $(CXXFLAGS) -shared -Wl,--no-as-needed -Wl,-soname,$(LIBSONAME) -o $@ $(LDFLAGS) $(LIBOBJ)
	ln -sf $(LIBSONAME) $(LIBDIR)/$(LIBNAME)
	ln -sf $(LIBTARGET) $(LIBDIR)/$(LIBSONAME)

$(LIBDIR)/$(STATICLIBTARGET) : $(LIBOBJ)
	@printf "Archiving %-35s > %s\n" $(STATICLIBTARGET) $@
	mkdir -p $(LIBDIR)
	ar cr $@ $(LIBOBJ)

$(INCDIR)/%.h : src/%.h
	@printf "Grabbing  %-35s > %s\n" $< $@
	mkdir -p $(INCDIR)
	cp -f $< $@

$(OBJDIR)/%.o : src/%.cu
	@printf "Compiling %-35s > %s\n" $< $@
	mkdir -p $(OBJDIR)
	$(NVCC) -c $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< -o $@
	@$(NVCC) -M $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< > $(@:%.o=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%.o=%.d.tmp) > $(@:%.o=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%.o=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%.o=%.d)
	@rm -f $(@:%.o=%.d.tmp)

clean :
	rm -rf $(BUILDDIR)

install : lib
	mkdir -p $(PREFIX)/lib
	mkdir -p $(PREFIX)/include
	cp -P -v $(BUILDDIR)/lib/* $(PREFIX)/lib/
	cp -v $(BUILDDIR)/include/* $(PREFIX)/include/


#### TESTS ####

TEST_ONLY ?= 0

# Tests depend on lib, except in TEST_ONLY mode.
ifeq ($(TEST_ONLY), 0)
TSTDEP = $(INCTARGETS) $(LIBDIR)/$(LIBTARGET)
endif

NCCL_LIB ?= $(LIBDIR)
NCCL_INC ?= $(INCDIR)

MPI_HOME ?= /usr
MPI_INC ?= $(MPI_HOME)/include
MPI_LIB ?= $(MPI_HOME)/lib
MPIFLAGS   := -I$(MPI_INC) -L$(MPI_LIB) -lmpi

TESTS       := all_gather_test     all_gather_scan \
               all_reduce_test     all_reduce_scan \
               broadcast_test      broadcast_scan \
               reduce_test         reduce_scan \
               reduce_scatter_test reduce_scatter_scan
MPITESTS    := mpi_test

TSTINC     := -I$(NCCL_INC) -Itest/include
TSTLIB     := -L$(NCCL_LIB) $(LIBLINK) $(LDFLAGS)
TSTDIR     := $(BUILDDIR)/test/single
MPITSTDIR  := $(BUILDDIR)/test/mpi
TESTBINS   := $(patsubst %, $(TSTDIR)/%, $(TESTS))
MPITESTBINS:= $(patsubst %, $(MPITSTDIR)/%, $(MPITESTS))

test : $(TESTBINS)

$(TSTDIR)/% : test/single/%.cu test/include/*.h $(TSTDEP)
	@printf "Building  %-35s > %s\n" $< $@
	mkdir -p $(TSTDIR)
	$(NVCC) $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" -o $@ $< $(TSTLIB) -lcuda -lcurand -lnvToolsExt
	@$(NVCC) -M $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< $(TSTLIB) -lcuda -lcurand -lnvToolsExt > $(@:%=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%=%.d.tmp) > $(@:%=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%=%.d)
	@rm -f $(@:%=%.d.tmp)

mpitest : $(MPITESTBINS)

$(MPITSTDIR)/% : test/mpi/%.cu $(TSTDEP)
	@printf "Building  %-35s > %s\n" $< $@
	mkdir -p $(MPITSTDIR)
	$(NVCC) $(MPIFLAGS) $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" -o $@ $< $(TSTLIB) -lcurand
	@$(NVCC) $(MPIFLAGS) -M $(TSTINC) $(NVCUFLAGS) --compiler-options "$(CXXFLAGS)" $< $(TSTLIB) -lcurand > $(@:%=%.d.tmp)
	@sed "0,/^.*:/s//$(subst /,\/,$@):/" $(@:%=%.d.tmp) > $(@:%=%.d)
	@sed -e 's/.*://' -e 's/\\$$//' < $(@:%=%.d.tmp) | fmt -1 | \
                sed -e 's/^ *//' -e 's/$$/:/' >> $(@:%=%.d)
	@rm -f $(@:%=%.d.tmp)

#### PACKAGING ####

DEBIANDIR  := $(BUILDDIR)/debian

DEBGEN_IN  := $(shell (cd debian ; ls *.in))
DEBGEN     := $(DEBGEN_IN:.in=)
DEBFILES   := compat copyright libnccl-dev.install libnccl-dev.manpages nccl.7 rules $(DEBGEN)
DEBTARGETS := $(patsubst %, $(DEBIANDIR)/%, $(DEBFILES))

DEB_REVISION   ?= 1
DEB_TIMESTAMP  := $(shell date -R)
DEB_ARCH       ?= amd64

debian : $(DEBTARGETS)

deb : lib debian
	@printf "Building Debian package\n"
	(cd $(BUILDDIR); debuild -eLD_LIBRARY_PATH -uc -us -d -b)
	mkdir -p $(BUILDDIR)/deb/
	mv $(BUILDDIR)/../libnccl*.deb $(BUILDDIR)/deb/

debclean :
	rm -Rf $(DEBIANDIR)

$(DEBIANDIR)/% : debian/%.in
	@printf "Generating %-35s > %s\n" $< $@
	sed -e "s/\$${nccl:Major}/$(NCCL_MAJOR)/g" \
	    -e "s/\$${nccl:Minor}/$(NCCL_MINOR)/g" \
	    -e "s/\$${nccl:Patch}/$(NCCL_PATCH)/g" \
	    -e "s/\$${cuda:Major}/$(CUDA_MAJOR)/g" \
	    -e "s/\$${cuda:Minor}/$(CUDA_MINOR)/g" \
	    -e "s/\$${deb:Revision}/$(DEB_REVISION)/g" \
	    -e "s/\$${deb:Timestamp}/$(DEB_TIMESTAMP)/g" \
	    -e "s/\$${deb:Arch}/$(DEB_ARCH)/g" \
	    $< > $@

$(DEBIANDIR)/% : debian/%
	@printf "Grabbing  %-35s > %s\n" $< $@
	mkdir -p $(DEBIANDIR)
	cp -f $< $@

#### FORTRAN BINDINGS ####

export NCCL_MAJOR NCCL_MINOR NCCL_PATCH CUDA_MAJOR CUDA_MINOR LIBLINK CUDA_LIB BUILDDIR

forlib : lib
	$(MAKE) -C fortran lib
fortest : forlib
	$(MAKE) -C fortran test
forclean :
	$(MAKE) -C fortran clean

@sjeaugey
Copy link
Member

That's another way to do it, indeed. Actually, building for many architectures also increases the build times (for everyone) so you may want to only compile for the architectures you care about.

@sjeaugey
Copy link
Member

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants