Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVENC not working on Bare Metal or in Docker with Nvidia Container Runtime #378

Closed
ich777 opened this issue Sep 22, 2022 · 18 comments
Closed
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate

Comments

@ich777
Copy link

ich777 commented Sep 22, 2022

NVIDIA Open GPU Kernel Modules Version(s):

515.76
520.56.06

Does this happen with the proprietary driver (of the same version) as well?

No

Operating System and Version

Unraid 6.11.0-rc5p2 (Slackware 15.0)

Kernel Release

5.19.9-Unraid

Hardware: GPU

NVIDIA T400

Describe the bug

Trying to compile the Nvidia Open Source Kernel module alongside with the proprietary binaries and libraries for Unraid for use in Docker containers but ran into an issue where I get the following message in my syslog when issuing nvidia-smi from the command line after a reboot:
NVRM objClInitPcieChipset: *** Chipset Setup Function Error!

Is the Open Source Kernel module in general compatible with the container runtime for use in Docker containers?
When running nvidia-smi I get the appropriate output or at least I think:
grafik

I also have to run the module with NVreg_OpenRmEnableUnsupportedGpus=1 otherwise I get this error but I think that is expected or am I wrong:
Open nvidia.ko is only ready for use on Data Center GPUs.

To Reproduce

Compile drivers for Unraid, install the package on Unraid.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

I'm really not sure if this is even a bug or if I'm missing something...

@ich777 ich777 added the bug Something isn't working label Sep 22, 2022
@ich777
Copy link
Author

ich777 commented Sep 29, 2022

Can't anyone help or has a clue why this is not working? Maybe I'm missing something obvious...?

I'm more than curious if the Open GPU Kernel module is compatible with the Nvidia Container Runtime

@mtijanic
Copy link
Collaborator

Hi @ich777 , am I understanding correctly that the driver is working, but you are getting the objClInitPcieChipset error message in the log? If that's the case, I think the reason you don't see it in the proprietary driver is because it just doesn't print these error messages.

The error happens if your particular chipset is not known to the driver, and this generally shouldn't happen - I'll take a look at why it does - but, it isn't a problem on its own. The chipset detection is done to apply various chipset specific bugfixes and workarounds, and if your chipset doesn't need any, then it should all be fine.

@ich777
Copy link
Author

ich777 commented Sep 29, 2022

@mtijanic thank you for the answer!
Yes exactly, at least it seems so.
I boot the machine headless because I'm trying to get NVENC working with the Open Source driver to bring it to the Unraid community too because I really like the idea of the Open Source driver even if it uses parts of the proprietary package...

I compile the driver later today again for Unraid 6.11.0 stable (based on Kernel 5.19.9) and also try to connect a display if this is making any difference.

The Open Source driver should be however compatible with the Nvidia Container Runtime for Docker containers or am I wrong about that?

@mtijanic
Copy link
Collaborator

I've never tested it explicitly, but I believe it should be compatible with the container runtime. It is missing virtualization functionality, but AFAIK there is no virtualization taking place when using the container runtime.

Ignoring the messages in the log, is there any particular thing that is not working as expected?

@ich777
Copy link
Author

ich777 commented Sep 29, 2022

@mtijanic I've now compiled the driver for 5.19.9, connected a display to the card.
Display output, Xorg and everything else is working just fine.
But as soon as I force a transcode in the official Jellyfin container nvidia-smi shows up that ffmpeg is using resources but then it almost instantly disappears. What is even more strange is that ffmpeg tells me that it is out of memory what seems really strange to me because Xorg uses 110MB and Jellyfin or better speaking ffmpeg wants to allocate about 80MB.

That's the error that I get from Jellyfin:

[h264_nvenc @ 0x561c61d16a40] OpenEncodeSessionEx failed: out of memory (10): (no details)

Ignoring the messages in the log, is there any particular thing that is not working as expected?

Yes, transcoding in containers, is there any other thing that I can provide or you can recommend me to do/try?

@ich777
Copy link
Author

ich777 commented Sep 30, 2022

@mtijanic after doing some more research this message indicated that the NVENC limit is exceeded (usually three simultaneous transcodes are possible with this kind of cards if I'm not mistaken) but actually nothing is using the card and it even fails to transcode one file.

Also this seems related #104

@ich777 ich777 changed the title Nvidia T400 - NVRM objClInitPcieChipset: *** Chipset Setup Function Error! Nvidia T400 - NVENC not working with container runtime Sep 30, 2022
@mtijanic
Copy link
Collaborator

Hi, yes, looks like it is a duplicate of #104 , and is tracked internally as bug 3661377. The issue has been root caused, but unfortunately I can't give any ETA on when the fix will be.

@mtijanic mtijanic added the NV-Triaged An NVBug has been created for dev to investigate label Sep 30, 2022
@ich777
Copy link
Author

ich777 commented Sep 30, 2022

@mtijanic I will advice users to report NVENC issues here from #104 since the user closed this issue with no real solution...

Thank you anyways for the help, really appreciated!

@ich777 ich777 changed the title Nvidia T400 - NVENC not working with container runtime NVENC not working on Bare Metal or in Docker with Nvidia Container Runtime Sep 30, 2022
@troykelly
Copy link

troykelly commented Oct 2, 2022

For those that need this to work while NV works out what is causing the bug proposes a timeline for a fix...
#104 (comment)

@troykelly
Copy link

This problem remains in 520.61.05 - the issue should be updated.

@ich777
Copy link
Author

ich777 commented Oct 7, 2022

This problem remains in 520.61.05 - the issue should be updated.

I think this is a Enterprise driver or am I wrong about that?

@troykelly
Copy link

@ich777
Copy link
Author

ich777 commented Oct 7, 2022

I don't think so - it's the latest bundled driver in the kit

TBH I will update the issue until it's released and it is listed here, hope that's good enough for you.

@troykelly
Copy link

I'm not sure I follow... it is released - that's their official link...
If there is a way to select the driver in the cuda download... I'm all for waiting - but given it's the driver that comes with the download - I don't see why there should be any hesitation?

@ich777
Copy link
Author

ich777 commented Jan 10, 2023

Seems to be working since driver version 525.60.13, have you yet tried if it's working for you too @troykelly?

@troykelly
Copy link

Looks to be working @ich777 - I've tested on one box, testing on others now.
Did have some upgrade issues, working process is here if anybody else has problems
https://gist.github.com/troykelly/7445ca387aa069a852a4a96c9a57d6a6

@troykelly
Copy link

Just for clarity - 525.60.13 restores the availability to access encoders/decoders with the published limitations (eg 3 encoders)

@ich777
Copy link
Author

ich777 commented Jan 11, 2023

Just for clarity - 525.60.13 restores the availability to access encoders/decoders with the published limitations (eg 3 encoders)

Exactly like it is advertised for Nvidia consumer cards, however for example a Quadro P2000 should have unlimited sessions.

I will close this issue tomorrow if you don't report any further issue with NVENC.

@ich777 ich777 closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests

3 participants