Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Failed on Ubuntu20.04 #20

Open
alice890308 opened this issue Dec 17, 2022 · 4 comments
Open

Build Failed on Ubuntu20.04 #20

alice890308 opened this issue Dec 17, 2022 · 4 comments

Comments

@alice890308
Copy link

Hi, I'm trying to build Altis on my server and on docker container, but both encounter the same errors. The following descriptions only show a part of the error messages. here contains the complete error messages

Environment

Ubuntu: 20.04
CUDA version: 11.8
Docker image: nvidia/cuda:11.8.0-devel-ubuntu20.04
cmake: 3.16
GPU: nvidia A100, sm number: 80

Error Messages

First I tried to run ./setup.sh and saw the following result

image

image

Then I tried to understand the building process, so I checked here and applied these steps manually.
When running cmake -DCMAKE_CUDA_ARCHITECTURES=80 it shows the following message. But I'm not sure if this is important.

image

The fatal error occurred when running the last make command.

image

Reproduce Steps

Run nvidia docker image

sudo nvidia-docker run -it nvidia/cuda:11.8.0-devel-ubuntu20.04 /bin/bash

Install git and cmake

apt-get update
apt-get install git cmake

Clone this repo

git clone https://github.com/utcs-scea/altis.git

run setup.sh or follow the build steps to build Altis.

Thanks for viewing my issue. Any reply is appreciated

@rossbach
Copy link
Member

rossbach commented Dec 18, 2022 via email

@BDHU
Copy link
Member

BDHU commented Dec 23, 2022

@alice890308 if you set VERBOSE=1 before cmake command what does it show? This way we can see the exact build command and what files make is expecting. Is it possible to get the complete make log? My speculation is some files are not built due to unspecified SM numbers.

@alice890308
Copy link
Author

@BDHU Hi! It shows the following messages.

root@23cf7bea18ba:/altis/config/cuda_device_attr_gen# make VERBOSE=1
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

Is this the complete make log you are looking for? Or would you like me to check anything else?

@BDHU
Copy link
Member

BDHU commented Jan 11, 2023

@alice890308 Apologies for the late reply. Can you go into the build directory and remove everything inside? Then execute these two commands:

cmake -DCMAKE_CUDA_ARCHITECTURES=$($SCRIPTPATH/config/get_cuda_sm.sh) ..

and

make VERBOSE=1

I've tested your setup with the exact same docker version and encountered no problem. However, I've only tested on SM61. Therefore, I suspect something has changed in the SM80 series. The above command allows us to see which specific make command causes the failure.

For example, I noticed you failed to build the maxflops object file. This is the first workload to build right after libAltisCommon.a is generated. In my setup, the building command is (that's why we need make VERBOSE=1 to show the message):

[  5%] Building CUDA object src/cuda/level0/maxflops/CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o
cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc   -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common  -w -gencode arch=compute_61,code=sm_61 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

This specific line:

cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common -w -gencode arch=compute_61,code=sm_61 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

is in charge of generating the MaxFlops.cu.o file. You can simple copy and rerun it to produce the same error without going through all the cmake generation process.

So in your setup, it might look like this:

[  5%] Building CUDA object src/cuda/level0/maxflops/CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o
cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc   -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common  -w -gencode arch=compute_80,code=sm_80 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

I would first watch for any missing flags or parameters. It's very likely CMake failed to generate some commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants