Error using nvdia-docker #11

danperazzo · 2019-07-26T20:09:38Z

Hello. I am trying to get the results as displayed but when I execute sudo nvidia-docker run --rm --volume /:/host --workdir /host$PWD tf_colmap bash demo.sh the docker returns a error. Below is the error that I have encountered:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411 --pid=7506 /var/lib/docker/overlay2/74b368071c67140593255d9461eb525598dbbca0ab382047da530356351746c6/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknown.

The text was updated successfully, but these errors were encountered:

bmild · 2019-07-26T21:21:24Z

Based on some quick googling it sounds like it might be a driver error. What GPU are you using and what version of the nvidia drivers (should be in top left if you run nvidia-smi command)?

The docker container is trying to run a CUDA 10.0 image. Based on the driver requirements for CUDA 10.0 from this table, looks like you need nvidia drivers >= 410.48.
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements

Your problem looks similar to these ones, if you need more information to debug.
NVIDIA/nvidia-docker#861
NVIDIA/nvidia-docker#931

Hope that helps!

danperazzo · 2019-07-26T21:56:11Z

Thanks!! I have just updated the drivers and got rid of this error. However, I have got a segmentation fault, below is the error message. Again, thank you very much :))
demo.sh: line 24: 106 Segmentation fault (core dumped) cuda_renderer/cuda_renderer data/testscene/mpis_360 data/testscene/outputs/test_path.txt data/testscene/outputs/test_vid.mp4 360 .8 18

bmild · 2019-07-28T15:23:26Z

Hmm interesting. What GPU are you using?

Were the MPIs correctly generated and saved in the folder data/testscene/mpis_360? Check for the file data/testscene/mpis_360/mpi19/mpi.b. If not, the renderer will definitely segfault and the issue was with generating the MPIs.

If the MPIs do exist, maybe there was not enough GPU memory for the renderer. It requires about 800MB. You could try ensuring this memory is available, then running the renderer command by itself in the docker env, like this (copy+paste as a single command in terminal)
sudo nvidia-docker run --rm --volume /:/host --workdir /host$PWD tf_colmap cuda_renderer/cuda_renderer data/testscene/mpis_360 data/testscene/outputs/test_path.txt data/testscene/outputs/test_vid.mp4 360 .8 18

danperazzo · 2019-07-30T17:13:15Z

Hello, I have just checked and It did not render any MPIs and I am using a NVIDIA GeForce GTX 1050

bmild · 2019-07-31T07:06:53Z

I saw in the error you posted:
IOError: File ./checkpoints/papermodel/checkpoint.meta does not exist.
Did you download the trained checkpoint (using bash download_data.sh)?

danperazzo · 2019-07-31T14:28:51Z

I have just checked and I have this file on my project(checkpoint.meta). I have just checked and, apparently, there was error with the tensorflow:

2019-07-31 14:24:09.655268: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at conv_ops_3d.cc:332 : Resource exhausted: OOM when allocating tensor with shape[1,32,360,480,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "imgs2mpis.py", line 82, in
args.numplanes, args.no_mpis, True, args.psvs)
File "imgs2mpis.py", line 53, in gen_mpis
mpis = run_inference(imgs, poses, mpi_bds, ibr_runner, num_planes, patched, disps=disps, psvs=psvs)
File "/host/home/daniel/LLFF/llff/inference/mpi_utils.py", line 156, in run_inference
mpi.generate(generator, num_planes)
File "/host/home/daniel/LLFF/llff/inference/mpi_utils.py", line 55, in generate

danperazzo · 2019-07-31T15:33:55Z

Well, I have just discovered that there was insuficient GPU memory. My GPU has 4GB. How much memory you had?

bmild · 2019-07-31T15:43:06Z

Ah yeah that's it - I always use GPUs with at least 8GB.

You should be able to make it run with 4GB by changing this line to patched = True (this will make the network compute the output in smaller patches).
And then by changing the argument valid=270 in this line to valid=120 (this controls the width/height of the patch computed by the network).

These two changes make the demo.sh run using only about 2.4GB on my GPU.

danperazzo · 2019-07-31T18:37:56Z

Alright, thanks a lot!!!! It worked :)) Those changes affect only the processing time or it impacts the final result?

bmild · 2019-07-31T18:56:15Z

Great! It will only slow down the time, results should be the same.

danperazzo · 2019-07-31T19:08:57Z

Alright, thanks!!!

bmild closed this as completed Aug 9, 2019

bmild mentioned this issue Oct 7, 2019

Low quality video #17

Closed

bmild mentioned this issue Dec 15, 2019

Unable to run demo.sh #24

Open

mwsunshine mentioned this issue Jun 14, 2019

Bundle adjustment Not converged #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using nvdia-docker #11

Error using nvdia-docker #11

danperazzo commented Jul 26, 2019

bmild commented Jul 26, 2019

danperazzo commented Jul 26, 2019 •

edited

Loading

bmild commented Jul 28, 2019

danperazzo commented Jul 30, 2019 •

edited

Loading

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

danperazzo commented Jul 31, 2019

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

Error using nvdia-docker #11

Error using nvdia-docker #11

Comments

danperazzo commented Jul 26, 2019

bmild commented Jul 26, 2019

danperazzo commented Jul 26, 2019 • edited Loading

bmild commented Jul 28, 2019

danperazzo commented Jul 30, 2019 • edited Loading

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

danperazzo commented Jul 31, 2019

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

bmild commented Jul 31, 2019

danperazzo commented Jul 31, 2019

danperazzo commented Jul 26, 2019 •

edited

Loading

danperazzo commented Jul 30, 2019 •

edited

Loading