Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to test encoder with cuda_device_context && no valid NVENC devices found #3819

Closed
youngerConvergence opened this issue Apr 9, 2023 · 12 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@youngerConvergence
Copy link

youngerConvergence commented Apr 9, 2023

Describe the bug
failed to test encoder with cuda_device_context && no valid NVENC devices found

To Reproduce
When I use xpra start -d nvenc to test NVENC, I get the following error message:

2023-04-09 15:42:28,484 GStreamer version 1.10.4^[[0m
^[[32m2023-04-09 15:42:28,618 debug enabled for importlib._bootstrap / ('encoder', 'nvenc')^[[0m
^[[32m2023-04-09 15:42:28,619 c_parseguid(CE788D20-AAA9-4318-92BB-AC7E858C8D36)={'Data1': 3464006944, 'Data2': 43689, 'Data3': 17176, 'Data4': b'\x92\xbb\xac~\x85\x8c\x8d6\x88\xed\xbd\x15\x0e\x7f'}^[[0m
^[[32m2023-04-09 15:42:28,620 nvenc.init_module()^[[0m
^[[32m2023-04-09 15:42:28,620 NVENC encoder API version 10.0^[[0m
^[[32m2023-04-09 15:42:28,621 init_nvencode_library() will try to load libcuda.so^[[0m
^[[32m2023-04-09 15:42:28,622 init_nvencode_library() <bound method LibraryLoader.LoadLibrary of <ctypes.LibraryLoader object at 0x7f0e357730b8>>(libcuda.so)=<CDLL 'libcuda.so', handle 7f0e0c35dca0 at 0x7f0e14c02470>^[[0m
^[[32m2023-04-09 15:42:28,622 init_nvencode_library() libcuda.cuCtxGetCurrent=<_FuncPtr object at 0x7f0e16ccec00>^[[0m
^[[32m2023-04-09 15:42:28,622 init_nvencode_library() will try to load libnvidia-encode.so.1^[[0m
^[[32m2023-04-09 15:42:28,622 init_nvencode_library() <bound method LibraryLoader.LoadLibrary of <ctypes.LibraryLoader object at 0x7f0e357730b8>>(libnvidia-encode.so.1)=<CDLL 'libnvidia-encode.so.1', handle 7f0e0c00c0e0 at 0x7f0e1c03be48>^[[0m
^[[32m2023-04-09 15:42:28,623 init_nvencode_library() NvEncodeAPICreateInstance=<_FuncPtr object at 0x7f0e16ccecc8>^[[0m
2023-04-09 15:42:28,624 CUDA initialization (this may take a few seconds)^[[0m
2023-04-09 15:42:30,085 CUDA 11.6.0 / PyCUDA 2022.1, found 2 devices:^[[0m
2023-04-09 15:42:30,086   + Quadro RTX 6000 @ 0000:21:00.0 (memory: 99% free, compute: 7.5)^[[0m
2023-04-09 15:42:30,204   + Quadro RTX 6000 @ 0000:71:00.0 (memory: 99% free, compute: 7.5)^[[0m
2023-04-09 15:42:30,255 NVidia driver version 510.47.3^[[0m
^[[32m2023-04-09 15:42:30,255 init_module() will try keys: [None]^[[0m
^[[32m2023-04-09 15:42:30,257 testing encoder with device 0^[[0m
^[[32m2023-04-09 15:42:30,319 init_cuda(<pycuda._driver.Context object at 0x7f0e144427c0>) pixel format=None^[[0m
^[[32m2023-04-09 15:42:30,319 init_cuda(<pycuda._driver.Context object at 0x7f0e144427c0>)^[[0m
^[[32m2023-04-09 15:42:30,320 init_cuda cuda info={'driver': {'version': (11, 6, 0), 'driver_version': 11060}}^[[0m
^[[32m2023-04-09 15:42:30,320 failed to test encoder with cuda_device_context(0 - locked)
Traceback (most recent call last):
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 3023, in xpra.codecs.nvidia.nvenc.encoder.init_module
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 1807, in xpra.codecs.nvidia.nvenc.encoder.Encoder.init_cuda
AssertionError: failed to get current cuda context, cuCtxGetCurrent returned 34^[[0m
^[[33m2023-04-09 15:42:30,321  device Quadro RTX 6000 is not supported: failed to get current cuda context, cuCtxGetCurrent returned 34^[[0m
^[[32m2023-04-09 15:42:30,321 clean() cuda_context=None, encoder context=0x0^[[0m
^[[32m2023-04-09 15:42:30,321 clean() done^[[0m
^[[32m2023-04-09 15:42:30,321 testing encoder with device 1^[[0m
^[[32m2023-04-09 15:42:30,387 init_cuda(<pycuda._driver.Context object at 0x7f0e14442fa8>) pixel format=None^[[0m
^[[32m2023-04-09 15:42:30,387 init_cuda(<pycuda._driver.Context object at 0x7f0e14442fa8>)^[[0m
^[[32m2023-04-09 15:42:30,387 init_cuda cuda info={'driver': {'version': (11, 6, 0), 'driver_version': 11060}}^[[0m
^[[32m2023-04-09 15:42:30,387 failed to test encoder with cuda_device_context(1 - locked)
Traceback (most recent call last):
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 3023, in xpra.codecs.nvidia.nvenc.encoder.init_module
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 1807, in xpra.codecs.nvidia.nvenc.encoder.Encoder.init_cuda
AssertionError: failed to get current cuda context, cuCtxGetCurrent returned 34^[[0m
^[[33m2023-04-09 15:42:30,387  device Quadro RTX 6000 is not supported: failed to get current cuda context, cuCtxGetCurrent returned 34^[[0m
^[[32m2023-04-09 15:42:30,388 clean() cuda_context=None, encoder context=0x0^[[0m
^[[32m2023-04-09 15:42:30,388 clean() done^[[0m
^[[33m2023-04-09 15:42:30,388 no valid NVENC devices found^[[0m
^[[32m2023-04-09 15:42:30,388 nvenc.cleanup_module()^[[0m

System Information (please complete the following information):

  • Quadro RTX 6000
  • NVidia driver version 510.47.3
  • CUDA version 11.6.0
  • OS Centos 7.9
  • NVENC SDK version 10.0
  • Xpra Server Version 5.0
  • Xpra Client Version 4.4.4
@youngerConvergence youngerConvergence added the bug Something isn't working label Apr 9, 2023
@totaam
Copy link
Collaborator

totaam commented Apr 9, 2023

Please always specify the full version number - 5.0 is not specific enough.
It also doesn't state what OS you're running or what packages you have installed.
It's probably worth trying a newer NVidia driver version than 510, some features (ie: nvjpeg decoding) don't work properly on versions older than 525.

@youngerConvergence
Copy link
Author

On around March 26th, I pulled the source code from the master branch and compiled and installed it myself. The xpra version displayed is v5.0. The operating system is CentOS 7.9. Currently, xpra can run normally, but the bandwidth usage is too high, so I want to try using nvenc for optimization. Initially, I was using driver version 515, but the error still occurred.

@youngerConvergence
Copy link
Author

Please always specify the full version number - is not specific enough. It also doesn't state what OS you're running or what packages you have installed. It's probably worth trying a newer NVidia driver version than 510, some features (ie: decoding) don't work properly on versions older than 525.5.0``nvjpeg

Can you provide any other suggestions, such as the version of the NVENC SDK, Nvidia driver, and CUDA version, that I can try?

@totaam
Copy link
Collaborator

totaam commented Apr 9, 2023

On around March 26th, I pulled the source code from the master branch ..

That's a critical bit of information.

.. and compiled and installed it myself.
The operating system is CentOS 7.9.

This is not a supported configuration.

The xpra version displayed is v5.0

xpra --version will show what full version is actually installed.

Currently, xpra can run normally, but the bandwidth usage is too high

Specifically? How high is it?
Running what type of session? What type of application? What resolution?

Can you provide any other suggestions..

Yes: please try a newer OS release, or try building a stable version of xpra instead of the development branch.

as the version of the NVENC SDK, Nvidia driver, and CUDA version

The latest ones is what we use - except for CUDA which is stuck on 11.8 for now: #3808 (comment)

@youngerConvergence
Copy link
Author

youngerConvergence commented Apr 10, 2023

This is not a supported configuration.

Does it not support the CentOS 7.9

xpra --version will show what full version is actually installed.
try building a stable version of xpra instead of the development branch

I have already switched to version 4.4.4, but the error still occurs.

Specifically? How high is it?
Running what type of session? What type of application? What resolution?

I am using desktop mode and have tested the following applications. 1920*1080
For glxgears, the average speed is 3.29MB/s and the highest speed is 4.72MB/s.
For HyperViewer (Abaqus 200W mesh), the average speed is 2.45MB/s and the highest speed is 5.70MB/s.
For Fluent (1000W mesh), the speed is 2.69MB/s on average and the highest speed is 5.15MB/s.

@totaam
Copy link
Collaborator

totaam commented Apr 10, 2023

Does it not support the CentOS 7.9

No. You may be able to build newer versions on CentOS 7.x but the problem is that too many of the libraries are going to be out of date.

I have already switched to version 4.4.4, but the error still occurs.

OK, have you tried with newer drivers?

the average speed is ...
3.29MB/s

MB/s is an unusual unit. Don't you mean Mbps instead?
Anyway, using nvenc will improve latency significantly but it will probably not improve the bandwidth consumption much - at least not for 1080p.
At 4K and above, you need nvenc to get a decent latency and framerate.

@youngerConvergence
Copy link
Author

MB/s is an unusual unit. Don't you mean Mbps instead?

"MB/s refers to network speed, which can be x8 to convert to Mbps.

Anyway, using nvenc will improve latency significantly but it will probably not improve the bandwidth consumption much - at least not for 1080p.

If nvenc cannot reduce bandwidth at 1080p, is there any other way to reduce bandwidth while minimizing stuttering?"

@totaam
Copy link
Collaborator

totaam commented Apr 11, 2023

"MB/s refers to network speed, which can be x8 to convert to Mbps.

Obviously. You should use Mbps for network bandwidth.

If nvenc cannot reduce bandwidth at 1080p, is there any other way to reduce bandwidth while minimizing stuttering?"

Try lower the min-quality and min-speed.

@youngerConvergence
Copy link
Author

OK, have you tried with newer drivers?

I have tried the latest version of the driver: 525.105.17, CUDA version: 11.8, NVENC SDK: 12.0, but still receive the following error.

nvenc.init_module()
NVENC encoder API version 12.0
init_nvencode_library() will try to load libcuda.so
init_nvencode_library() <bound method LibraryLoader.LoadLibrary of <ctypes.LibraryLoader object at 0x7fb1dbc73128>>(libcuda.so)=<CDLL 'libcuda.so', handle 27e7230 at 0x7fb1cc98a128>
init_nvencode_library() libcuda.cuCtxGetCurrent=<_FuncPtr object at 0x7fb1d70d0750>
init_nvencode_library() will try to load libnvidia-encode.so.1
init_nvencode_library() <bound method LibraryLoader.LoadLibrary of <ctypes.LibraryLoader object at 0x7fb1dbc73128>>(libnvidia-encode.so.1)=<CDLL 'libnvidia-encode.so.1', handle 27d0ad0 at 0x7fb1cd9c1da0>
init_nvencode_library() NvEncodeAPICreateInstance=<_FuncPtr object at 0x7fb1d70d0818>
CUDA initialization (this may take a few seconds)
CUDA 11.8.0 / PyCUDA 2022.1, found 2 devices:
  + Quadro RTX 6000 @ 0000:21:00.0 (memory: 99% free, compute: 7.5)
  + Quadro RTX 6000 @ 0000:71:00.0 (memory: 99% free, compute: 7.5)
NVidia driver version 525.105.17
init_module() will try keys: [None]
testing encoder with device 0
init_cuda(<pycuda._driver.Context object at 0x7fafa7e30c38>) pixel format=None
init_cuda(<pycuda._driver.Context object at 0x7fafa7e30c38>)
init_cuda cuda info={'driver': {'version': (11, 8, 0), 'driver_version': 12000}}
failed to test encoder with cuda_device_context(0 - locked)
Traceback (most recent call last):
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 3023, in xpra.codecs.nvidia.nvenc.encoder.init_module
    test_encoder.init_cuda(device_context)
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 1807, in xpra.codecs.nvidia.nvenc.encoder.Encoder.init_cuda
    assert result==0, "failed to get current cuda context, cuCtxGetCurrent returned %s" % CUDA_ERRORS_INFO.get(result, result)
AssertionError: failed to get current cuda context, cuCtxGetCurrent returned 34
 device Quadro RTX 6000 is not supported: failed to get current cuda context, cuCtxGetCurrent returned 34
clean() cuda_context=None, encoder context=0x0
clean() done
testing encoder with device 1
init_cuda(<pycuda._driver.Context object at 0x7fafa06e9450>) pixel format=None
init_cuda(<pycuda._driver.Context object at 0x7fafa06e9450>)
init_cuda cuda info={'driver': {'version': (11, 8, 0), 'driver_version': 12000}}
failed to test encoder with cuda_device_context(1 - locked)
Traceback (most recent call last):
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 3023, in xpra.codecs.nvidia.nvenc.encoder.init_module
    test_encoder.init_cuda(device_context)
  File "xpra/codecs/nvidia/nvenc/encoder.pyx", line 1807, in xpra.codecs.nvidia.nvenc.encoder.Encoder.init_cuda
    assert result==0, "failed to get current cuda context, cuCtxGetCurrent returned %s" % CUDA_ERRORS_INFO.get(result, result)
AssertionError: failed to get current cuda context, cuCtxGetCurrent returned 34
 device Quadro RTX 6000 is not supported: failed to get current cuda context, cuCtxGetCurrent returned 34
clean() cuda_context=None, encoder context=0x0
clean() done
no valid NVENC devices found
nvenc.cleanup_module()

@totaam
Copy link
Collaborator

totaam commented Apr 11, 2023

CUDA 11.8.0 / PyCUDA 2022.1, found 2 devices:

I really don't think either of these will make any difference, but trying newer versions might help:

  • pycuda 2022.2.2
  • CUDA 12.0.76

What's much more likely to fix things: use a newer OS.
CentOS 7 is just too old.

+ Quadro RTX 6000 @ 0000:21:00.0 (memory: 99% free, compute: 7.5)
+ Quadro RTX 6000 @ 0000:71:00.0 (memory: 99% free, compute: 7.5)

Do you actually have 2 of those cards in the system?

AssertionError: failed to get current cuda context, cuCtxGetCurrent returned 34

34 is CUDA_ERROR_STUB_LIBRARY - maybe your LD_LIBRARY_PATH is wrong and it is loading the stub instead of the real cuda.


If not, please post the output of:

XPRA_NVENC_DEBUG_API=1 XPRA_CUDA_DEBUG=1 XPRA_NVENC_DEBUG=1 ./xpra/codecs/loader.py -v nvenc

Preferably against the latest git master.

@totaam
Copy link
Collaborator

totaam commented Apr 14, 2023

I'm guessing it was the stub library on the path.

@totaam totaam closed this as completed Apr 14, 2023
@youngerConvergence
Copy link
Author

I'm guessing it was the stub library on the path.

Thank you, it's already solved. The effective libcuda.so was incorrect

@totaam totaam added the invalid This doesn't seem right label Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants