Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libmpv: Severe screen corruption when rendering video via mpv_render_context_render to virtual x-server #14577

Open
6 tasks done
deeptho opened this issue Jul 18, 2024 · 15 comments
Labels

Comments

@deeptho
Copy link

deeptho commented Jul 18, 2024

mpv Information

This is the first version in which the problem can be reproduced.
Found by bisecting. The problem also happens in master (3ab989e554)
mpv --version
mpv bad1-dirty Copyright © 2000-2023 mpv/MPlayer/mplayer2 projects
 built on Jul 18 2024 13:05:23
libplacebo version: v6.338.2
FFmpeg version: 6.1.1
FFmpeg library versions:
   libavutil       58.29.100
   libavcodec      60.31.102
   libavformat     60.16.100
   libswscale      7.5.100
   libavfilter     9.12.100
   libswresample   4.12.100

Other Information

  • Linux version:
    Fedora Linux

  • Kernel Version:
    6.9.8-200.fc40.x86_64

  • GPU Model:
    Intel Corporation AlderLake-S GT1

  • Mesa/GPU Driver Version:
    mesa-libGLU-9.0.3-4.fc40.x86_64
    mesa-libGLU-devel-9.0.3-4.fc40.x86_64
    mesa-filesystem-24.1.2-8.fc40.x86_64
    mesa-va-drivers-24.1.2-8.fc40.x86_64
    mesa-libglapi-24.1.2-8.fc40.x86_64
    mesa-dri-drivers-24.1.2-8.fc40.x86_64
    mesa-libgbm-24.1.2-8.fc40.x86_64
    mesa-libEGL-24.1.2-8.fc40.x86_64
    mesa-libgbm-devel-24.1.2-8.fc40.x86_64
    mesa-libGL-24.1.2-8.fc40.x86_64
    mesa-libGL-devel-24.1.2-8.fc40.x86_64
    mesa-libEGL-devel-24.1.2-8.fc40.x86_64
    mesa-libOpenCL-24.1.2-8.fc40.x86_64
    mesa-vulkan-drivers-24.1.2-8.fc40.x86_64
    mesa-libxatracker-24.1.2-8.fc40.x86_64
    mesa-libOSMesa-24.1.2-8.fc40.x86_64

  • Window Manager and Version:
    mate marco

  • Source mpv:
    from git

  • Introduced in version:
    c172a65
    not possible to reproduce in 3e612c0

Reproduction Steps

The problem can be reproduced using xpra
-start xpra in seamless mode, e.g., starting a terminal which runs
on a remote computer. Under the hood this starts an Xserver
with Xdummy or Xvfb. The problem occurs with both.
-in this terminal start a program that uses mpv-lib
For instance https://github.com/v0idv0id/MPVideoCube.git
Or (more difficult to compile:
https://github.com/deeptho/neumodvb

Sometimes the video displayed in the programs
looks ok, but sometimes video is heavily corrupted.
Investigation shows

  1. If the programs are run under virtualgl on the remote computer, all is fine
  2. If the programs are run directly, they sometimes show the expected output: for video cube this means (mostly) artefact free video displayed on a cube. For neumodvb, this means a live tv channel showing artefact free video. However, sometimes the video is completely black, or heavily corrupted: the video contains vertical/horizontal lines, or only small parts of it appear on screeen, or it looks heavily pixellated. See also screen corruption with fedora40 display client  Xpra-org/xpra#4300 for examples
  3. If screen corruption occurs, it continues to occur until a new video is displayed. If no corruption occurs at the start, then the video remains good for ever.

Expected Behavior

Non-corrupted video

Actual Behavior

Corrupted video.

Additional info:

  1. If an overlay is drawn on top of the video (neumodvb), after mpv renders it, that overlay looks fine. Both programs also run fine natively, not under xpra
  2. If in neumodvb I save the video rendered by mpv-lib, that video is also corrupted, but the overlay is not, suggesting strongly that mpv is causing the corruption
  3. The mpv command line client does not show corruption when playing videos
  4. The ONLY difference between the last good and first working mpv version seems to be
    a difference in default interpolation code, but that may just "trigger" the problem, rather than being the cause.
  5. Once the video displayed is corrupted, the corruption stays of the same type, although resizing. the window has some effect on the details of the corruption. So to reproduce the problem, multiple trials may be needed.

Please see the sreenshots here: Xpra-org/xpra#4300

I cannot attach log files, as there are none in this use case. Or is it possible to start one
in libmpv?

Log File

xxx.log

Sample Files

withosd
aaa

I carefully read all instruction and confirm that I did the following:

  • I tested with the latest mpv version to validate that the issue is not already fixed.
  • I provided all required information including system and mpv version.
  • I produced the log file with the exact same set of files, parameters, and conditions used in "Reproduction Steps", with the addition of --log-file=output.txt.
  • I produced the log file while the behaviors described in "Actual Behavior" were actively observed.
  • I attached the full, untruncated log file.
  • I attached the backtrace in the case of a crash.
@sfan5
Copy link
Member

sfan5 commented Jul 18, 2024

I cannot attach log files, as there are none in this use case. Or is it possible to start one
in libmpv?

There absolutely is. Set the "log-file" option via libmpv.

@deeptho
Copy link
Author

deeptho commented Jul 24, 2024

I have added
mpv-log-file=/tmp/mpv/log
to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect.
whereas other options in that file, e.g.
screenshot-directory=/tmp/screenshots
work as expected

@Akemi
Copy link
Member

Akemi commented Jul 25, 2024

I have added mpv-log-file=/tmp/mpv/log to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect. whereas other options in that file, e.g. screenshot-directory=/tmp/screenshots work as expected

it's not mpv-log-file=, it's log-file=.

@deeptho
Copy link
Author

deeptho commented Jul 25, 2024

Here is an mpv log file while the problem occurs.

  1. I start neumodvb
  2. I start displaying channel 4. There is audio but nothing is displayed
  3. I stop playback
  4. I start it a again. This time there is video but corupted by black horizontal and vertical lines.
    This is with git version c172a65 , which is the first version in which I can reproduce the corruption.
    libplacebo is at version 64c19545

mpv.log

This screenshot shows the corruption
problem

@kasper93
Copy link
Contributor

Duplicate of #13998

@kasper93 kasper93 marked this as a duplicate of #13998 Jul 26, 2024
@sfan5
Copy link
Member

sfan5 commented Jul 26, 2024

Duplicate of #13998

Are you sure? I don't see gpu-next being used here.

@deeptho
Copy link
Author

deeptho commented Jul 26, 2024

I have tried adding correct-downscaling=no to the mpv configuration
With only 5 trials I notice that

  1. The vertical/horzontal lines on 16:9 content do not seem to appear
  2. The problem that the screen remains black (no video) at the first trial is still there
  3. Other forms of corruption are also still there. See picture below. The strange thing is that these corruptions
    do not occur at each trial, so it must have something to do with initialisation. Note that the I did not resize the window manually, so the scaling is always the same.

Adding profile=fast produced no pictures at all

bad1

@kasper93
Copy link
Contributor

kasper93 commented Jul 26, 2024

Duplicate of #13998

Are you sure? I don't see gpu-next being used here.

I'm not sure what's going on here. I think we are looking at multiple different issues. For example the screenshot from #14577 (comment) shows corruption that happens with Intel when using gather. But indeed previous report was about Windows and gpu-next. Though the symptoms are the same. First broken commit c172a65 makes it clear we have some issue when downscaling, which is the same case as in the other issue.

The vertical/horzontal lines on 16:9 content do not seem to appear

Ok, so it seems to confirm that at least part of the problem is the same as the other one.

Adding profile=fast produced no pictures at all

That's worrying, because in this mode, we really don't do much work.

[   0.014][v][libmpv_render] GL_VERSION='4.5 (Compatibility Profile) Mesa 24.1.2'
[   0.014][v][libmpv_render] Detected desktop OpenGL 4.5.
[   0.014][v][libmpv_render] GL_VENDOR='Mesa'
[   0.014][v][libmpv_render] GL_RENDERER='llvmpipe (LLVM 18.1.6, 256 bits)'
[   0.014][v][libmpv_render] GL_SHADING_LANGUAGE_VERSION='4.50'

Are you able to test with older mesa build? I'm curious if those issues are new or were there before.

@sfan5
Copy link
Member

sfan5 commented Jul 26, 2024

It would also be helpful to link the code in the application where mpv is integrated.
The GL rendering has some constraints and there's a lot that can go wrong.

@deeptho
Copy link
Author

deeptho commented Aug 9, 2024

In neumoDVB, this is the source file that handles libmpv callbacks
https://github.com/deeptho/neumodvb/blob/master/src/viewer/neumompv.cc
Note that depending on the choices of the user, this code also draws an overlay on top
of mpv, but the issue of this ticket happens also without that overlay drawing.

Re the constraints: I am aware of those, although it is not always easy to understand
them correctlt: a long time ago, I also had to make some changes to prevent
the whole program from crashing when more than 2 mpv playbacks were used simultaneously. This happened after some silent change in GL (but I found some
comment in a GL source file).

The culprit then turned out to be illegal access from multiple
threads to the same GL context. This used to work fine (of course user code
has to guard with locks to prevent concurrent access), but I think now the context
can only be used by the thread that created it.

If you are wondering about the convoluted construct with the thread_local variable
to store the context: it is needed to solve this problem.
One of the problems was that libmpv uses different threads for the callbacks made by different video playbacks and the user code has to detect when it is called from
two different threads.

Regarding the issue of this ticket, this is not relevant, as only one playback is running
in the tests.

I found the limbmpv docoumentation you link to a bit misleading:
"This assumes the OpenGL context lives on a certain thread controlled by the

  • API user.
    "
    => it is libmpv that creates and controls the tread, not the api user.
    The api user indeed controls the context but not the thread and should be
    prepared for suddenly being called from a different thread.

@sfan5
Copy link
Member

sfan5 commented Aug 9, 2024

  • it is libmpv that creates and controls the tread, not the api user.
    The api user indeed controls the context but not the thread and should be
    prepared for suddenly being called from a different thread.

This is incorrect.
mpv will call the update callback on any thread it wants, but you must consistently use mpv_render_context_render on the thread that has the OpenGL context.
You can see in this example how it's done with an event and on_mpv_render_update does not itself call any render functions.

Looking at neumompv.cc you seem to be doing this correctly.

@sfan5
Copy link
Member

sfan5 commented Aug 10, 2024

In any case it should be easy to reproduce this bug with one of the mpv examples.

@deeptho
Copy link
Author

deeptho commented Aug 11, 2024

  • it is libmpv that creates and controls the tread, not the api user.
    The api user indeed controls the context but not the thread and should be
    prepared for suddenly being called from a different thread.

This is incorrect. mpv will call the update callback on any thread it wants, but you must

Yes, that is what I wrote: "it is libmpv that creates and controls the tread, not the api user."

consistently use mpv_render_context_render on the thread that has the OpenGL context.

That is the thread calling the user callback, so an mpv thread and not controlled by the user.
Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?

Otherwise it will get really complicated, as the user code callback can
not draw but instead would have to delegate this task to some other thread, which would
create needless context switches.

You can see in this example how it's done with an event and on_mpv_render_update does not itself call any render functions.

No, I did not claim that it mpv draws. The user callback draws, but it does that in a thread
created by mpv. The surprising bit was that mpv calls from multiple threads for multiple
simultaneous video playbacks and that it is then not possible to use the same
GL context even when locking to prevent simultaneous access.

I can understand why libmpv would call from a seperate thread for each video playback, but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread (it did not require that in older versions).

Looking at neumompv.cc you seem to be doing this correctly.

Thanks for that confirmation.

@sfan5
Copy link
Member

sfan5 commented Aug 14, 2024

That is the thread calling the user callback, so an mpv thread and not controlled by the user.
Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?

No. Why would you need that?

Otherwise it will get really complicated, as the user code callback can
not draw but instead would have to delegate this task to some other thread

Yes. This is what you have to do and just how the sdl example I linked works.

The user callback draws, but it does that in a thread created by mpv.

No, this is the exact opposite of what I said.
You create the OpenGL context and control the draw thread.
mpv calls the callback to tell you that you should draw. Do not draw inside the mpv callback, that's broken.

but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread

mpv/libmpv/render_gl.h

Lines 31 to 40 in acc69e0

* OpenGL interop
* --------------
*
* The OpenGL backend has some special rules, because OpenGL itself uses
* implicit per-thread contexts, which causes additional API problems.
*
* This assumes the OpenGL context lives on a certain thread controlled by the
* API user. All mpv_render_* APIs have to be assumed to implicitly use the
* OpenGL context if you pass a mpv_render_context using the OpenGL backend,
* unless specified otherwise.

@deeptho
Copy link
Author

deeptho commented Aug 15, 2024

It seems I was confused by some older, dead code in neuomdvb. The rendering indeed
takes place on a thread created by neumodvb, not by libmpv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants