-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P3 - Nvidia decoding sometimes returns CUDA_ERROR_UNKNOWN #239
Comments
Update here:
|
Hi, I'm facing exactly same issue with livepeer V0.5.35, Nvidia driver 525.78.01, CUDA version: 12.0 and power state P0. It happens on some streams, not all but I hit 28 times this error on last 24h. |
hi @yondonfu , I've a solution for this, not perfect one, but when it happens I invoke transcoder service again. So basically there are two transcoders and one will be terminated automatically in 3-4 secs. but this prevents me to loose streams because of this CUDA error. Please consider this, or a better implementation, for next releases :) |
debug CUDA_ERROR_UNKNOWN errors Why? Should follow up, but hard to debug until P2s are addressed and seem to have stopped.
Describe the bug
The GPU video decoding fails with
CUDA_ERROR_UNKNOWN
, needing the user to restart the node for future segments. Sometimes it's paired withCUDA_ERROR_OUT_OF_MEMORY
orCUDA_ERROR_ILLEGAL_ADDRESS
.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Decrease the blast radius of these errors if possible, and figure out the root cause.
Screenshots
ERROR_UNKNOWN
ERROR_ILLEGAL_ADDRESS
ERROR_OUT_OF_MEMORY
Additional context
Stack-trace for future reference:
LPMS - https://github.com/livepeer/lpms/blob/master/ffmpeg/decoder.c#L250
FFmpeg - entry-point https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L610
most-probable line causing the error https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L629
cuda-specific ctx creation routine
https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L379
cuCtxCreate call https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L363
The text was updated successfully, but these errors were encountered: