Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successfully loaded libtensorflow in Node.js, but couldn't load GPU. Make sure CUDA Toolkit and cuDNN are installed and accessible, or turn off GPU mode. #1060

Open
remz1337 opened this issue Dec 22, 2023 · 11 comments
Labels
bug Something isn't working priority: normal

Comments

@remz1337
Copy link

Which version of recognize are you using?

5.0.3

Enabled Modes

Face recognition

TensorFlow mode

GPU mode

Downstream App

Memories App

Which Nextcloud version do you have installed?

27.1.5

Which Operating system do you have installed?

Ubuntu 22.04

Which database are you running Nextcloud on?

Postgres 14.10

Which Docker container are you using to run Nextcloud? (if applicable)

N/A

How much RAM does your server have?

4Gb

What processor Architecture does your CPU have?

x86_64

Describe the Bug

This is minor but Recognize admin panel is telling me no GPU found, but everything seems to be working fine (I see the recognize/bin/node process running on my GPU using nvidia-smi). Not sure if normal, but although I see the process on my GPU, my CPU usage is also way up.

The exact warning appears in the NodeJS section of the admin panel:
Successfully loaded libtensorflow in Node.js, but couldn't load GPU. Make sure CUDA Toolkit and cuDNN are installed and accessible, or turn off GPU mode.

More info: Proxmox 7.2, Nextcloud LXC with GPU successfully passed through (it was already done for processing ffmpeg in the Memories app). Installed the CUDA and cuDNN libs through the recommended instructions (pip install tensorflow[and-cuda]) and python is finding my GPU.

Expected Behavior

If everything is working fine and using my GPU, then there shouldn't be any warning about GPU not found.

To Reproduce

Not sure, probably something to do with my setup. If you can point me where to look, I can provide more logs that may help.

Debug log

No response

@remz1337 remz1337 added the bug Something isn't working label Dec 22, 2023
Copy link

Hello 👋

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software
causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at
and if possible solved.
I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it.
Until then, please be patient.
Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation
to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can
collaborate to make this software better. For everyone.
Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge
and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and
try to fix the odd bug yourself. Everyone will be thankful for extra helping hands!
One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum,
to twitter or somewhere else. But this is a technical issue tracker, so please make sure to
focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue
Cheers 💙

@NikitaKorneev
Copy link

I have the same issue. I think I followed all the instructions regarding drivers and CUDA, DNN requirements.

@marcelklehr
Copy link
Member

Are you using CUDA 12 or CUDA 11? I believe we currently only support CUDA 11

@remz1337
Copy link
Author

Indeed CUDA 12. The app is still working though, it's just that warning message that seems to be the issue

@marcelklehr
Copy link
Member

I think it falls back to CPU if GPU can't be loaded

@remz1337
Copy link
Author

But I can see the recognize/bin/node process running on my GPU using nvidia-smi

@marcelklehr
Copy link
Member

huh

@Mikec78660
Copy link

Wondering if it is still the case that cuda 12 is not supported? I have:
Driver Version: 560.28.03 CUDA Version: 12.6

I have the same warning message when enabling gpu. I get a process on the GPU of a few hundred megs when I start a scan but no gpu utilization from that process.

@NikitaKorneev
Copy link

Wondering if it is still the case that cuda 12 is not supported? I have: Driver Version: 560.28.03 CUDA Version: 12.6

I have the same warning message when enabling gpu. I get a process on the GPU of a few hundred megs when I start a scan but no gpu utilization from that process.

There is something really wrong with this integration and idk if maintainers are on it...

@macka849
Copy link

I had the same issue, and I have sorted it, but with some caveats. Firstly, I am on ubuntu server 22, as this was the latest server when the program was written, which does not appear to have been updated since then. More on that later.

My GPU has CUDA compute 5.2, which is not directly supported by the tensorflow precompiled binarys. So, I had to compile my own. Seven hours on Xeon E3 V2, which was the successful attempt.

I am on the latest Nvidia GPU and CUDA drivers. After installing the CUDA driver from NVIDIAs .run file, I had to manually link some libraries, which is detailed by the installer at the end of the cuda driver install. The NVIDIA FS kernal part always failed, but doesn't seem neccassary. My GPU may not be compatible

Anyway, after all of that, and confirming that tensorflow was working with GPU as per the tensorflow website, recognize still failed.

I found the test_gputensorflow.js in the /nextcloud/apps/recognize/src, and manually ran it from that folder. "sudo node test_gputensorflow.js"

The output indicated it was looking for libcudnn8.so. Ubuntu has moved onto libcudnn9 in the official repositories, but there is a way to manually install it, that I found on:

https://stackoverflow.com/questions/66977227/could-not-load-dynamic-library-libcudnn-so-8-when-running-tensorflow-on-ubun

The guide is for ubuntu 20, but I did some digging around the NVIDIA archive to find a library for libcudnn8.so for ubuntu 22. Sadly, they did not have libcudnn8.so in the ubuntu 24 folder, so It looks like I'm stuck on ubuntu 22 until recognize is updated for libcudnn9.

In any case, I installed it, the message went away, and the recognize job I had running found 10th gear, and took off like a ferrari in a tank race.
Nvidia-smi showed 100% unilization by the program.

I hope this helps others out there who are trying to get this working. I'm going to sleep now.

@bugsyb
Copy link

bugsyb commented Dec 8, 2024

Question: would you mind to check logs if have movienet recognition working properly and classifying videos?

I'm running into:
#1122

Would be great if you could please check if after messages showing that ffmpeg finished extraction of frames:

Classifier process output: decoded 60/60 images

Do you have classification happening or maybe errors as in my case:

Classifier process output: 2024-12-08 10:15:46.966028: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:296 : NOT_FOUND: could not find registered platform with id: 0x7f6d69c7fae4\

Thanks!

In case if you'd need to find other approach, Recognize can be Dockerized too.
Due to hassle I went through to get mine working, Dockerized version with approaches have been shared and is available here:
https://github.com/bugsyb/recognize_docker
Latest commits and what I use myself is the nvidia-tensor-based built on nVidia released Tensorflow docker container, with added on top of it PHP, Nextcloud and Recognize + some other custom apps - easily removable from the build.

In terms of support - some GPUs/older might not be ported and supported under Cuda 12 - simply are being dropped.

Currently for most of population, the main limitation here is the code requirement set up by Recognize as it requires Cuda 11.

The shared Dockerfiles show also other approaches to build Tensorflow container.

Due to Cuda 11 requirement, we're stuck on specific versions of underlying OS level libraries, however as it's Docker - that doesn't matter as can be run on any system as long as nVidia toolkit is installed at the Docker host level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: normal
Projects
Status: Bugs
Development

No branches or pull requests

7 participants
@marcelklehr @bugsyb @remz1337 @macka849 @NikitaKorneev @Mikec78660 and others