Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

screen corruption with fedora40 display client #4300

Closed
deeptho opened this issue Jul 13, 2024 · 34 comments
Closed

screen corruption with fedora40 display client #4300

deeptho opened this issue Jul 13, 2024 · 34 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@deeptho
Copy link

deeptho commented Jul 13, 2024

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. server command
    neumodvb.py
    This is program I created myself. It scans satellite transmissions and displays video broadcast channels.
    The program works as expected when run natively on the server
    see https://github.com/deeptho/neumodvb

  2. client command
    ssh://tuf.mynet/99 --dpi=157 --start='terminator -p tuf'
    or variants without the --dpi

  3. specific action to trigger the bug
    The above client commands start a terminal. Start neumodvb.py from that terminal:
    neumodvb.py and start playing a broadcast channel.
    The results depend on the channel and on window size (the window can be resized).

First,, the screenshot "good" shows what the correct output would be. On the left and bottom there is some
text. On the right a video is playing (in the example in a very old broadcast format). This screenshot was captured
using older versions of xpra (see below for details)

good

Some channels display nothing at all, although there is audio.

Other channels display almost correct video but with some vertical lines of black pixels.
See "lines"

lines

Othe channels show sever distortion with some pixels present but most black. See "pixels"

pixels

The weirdest case shows pixels in a triangular region, with most other pixels black. See "triangle"

triangle

Note that the precise details of the corruption vary: the same channel sometimes works better
than other times. Also, at the start of the video I notice some colour flashing of the background of
the text regions.

System Information (please complete the following information):

  • Server OS: fedora 40
  • Client OS: fedora 38, 39 or 40
  • Xpra Server Version xpra-6.0.2-10.r0.fc40.x86_64
  • Xpra Client Version xpra-6.0.2-10.r0.fc40.x86_64 or lower (version 5)

Additional context

The "neumodvb.py" program works as follows. The video is rendered by libmpv and then possibly
overlaid with some graphics (not used in the examples) using the opengl interface of libmpv.

After a lot of experimentation with different xpra versions on different clients, I came to the conclusion
that only the server xpra version matters: the corruption occurs on older and newer client xpra versions
on older and newer fedora versions (38, 39,40).

Also, I tried two different computers for the server, and the problem occurs on both.
Both computers use intel embedded GPUs. One is quite old and another one is brand new.

It only occurs when the server is on fedora40, but it seems to depend on libraries used by xpra
and not on xpra itself: the stated xpra version can be made to work by downgrading libraries and
keeping the xpra version as stated above, By carefully downgrading various packages (not in a very orthodox way
as I had to override some dependencies to other packages, to narrow the list of suspect packages),
I found that one or more of the following libraries causes the problem when upgrading
to fedora40, whereas I get a good result if I use the fedora39 versions)

 compat-ffmpeg4                   x86_64 4.4.4-2.fc39        rpmfusion-free         7.6 M
 ffmpeg                           x86_64 6.1.1-5.fc39        rpmfusion-free-updates 1.8 M
 ffmpeg-devel                     x86_64 6.1.1-5.fc39        rpmfusion-free-updates 842 k
 ffmpeg-libs                      x86_64 6.1.1-5.fc39        rpmfusion-free-updates 8.2 M
 javascriptcoregtk4.1             x86_64 2.44.2-2.fc39       updates                8.4 M
 libplacebo                       x86_64 6.292.1-5.fc39      updates                391 k
 libplacebo-devel                 x86_64 6.292.1-5.fc39      updates                120 k
 mpv                              x86_64 0.36.0-3.fc39       fedora                 1.6 M
 mpv-devel                        x86_64 0.36.0-3.fc39       fedora                  46 k
 mpv-libs                         x86_64 0.36.0-3.fc39       fedora                 1.1 M
 tesseract                        x86_64 5.3.2-3.fc39        fedora                 1.3 M
 webkit2gtk4.1                    x86_64 2.44.2-2.fc39       updates                 25 M
 x265-libs                        x86_64 3.5-6.fc39          rpmfusion-free         1.3 M

I also observed the xpra server logs with the "compress" debug option and I noticed that when the
problem was at its worst, almost no lines were added to the log file. On the other hand, the lines that were
there seemed normal (similar to those on a working system): using vp8, x265 etc... I did not keep the server logs...

I suspect it may be either a bug in the encoder code, or in the detection of the content type.

WIth the above downgrades, I have a working system, but of course this is not a permanent solution.
Also, with this working system, there are still a few minor problems which are probably unrelated and started
appearing in the more recent fedora39 versions:
-at the start of video playback the while window flashes in colour (black becomes some dark almost black colour 2 or three times and then things work fine)
-sometimes when moving the cursor over a window, the content turns briefly slightly green and then later goes back to
normal.

I can do some more testing if needed.

@deeptho deeptho added the bug Something isn't working label Jul 13, 2024
@deeptho
Copy link
Author

deeptho commented Jul 13, 2024

For completenes: these are the versions of the packages in which the problem is already
present:

 compat-ffmpeg4                          x86_64 4.4.4-5.fc40      rpmfusion-free
 ffmpeg                                  x86_64 6.1.1-11.fc40     rpmfusion-free-updates
 ffmpeg-devel                            x86_64 6.1.1-11.fc40     rpmfusion-free-updates
 ffmpeg-libs                             x86_64 6.1.1-11.fc40     rpmfusion-free-updates
 libplacebo                              x86_64 6.338.2-1.fc40    fedora  415 k
 libplacebo-devel                        x86_64 6.338.2-1.fc40    fedora  127 k
 mpv                                     x86_64 0.37.0-4.fc40     fedora  1.5 M
 mpv-devel                               x86_64 0.37.0-4.fc40     fedora   47 k
 mpv-libs                                x86_64 0.37.0-4.fc40     fedora  1.0 M
 tesseract                               x86_64 5.3.4-4.fc40      fedora  1.3 M
 tiwilink-firmware                       noarch 20240709-1.fc40   updates 4.6 M
 x265-libs                               x86_64 3.6-2.fc40        rpmfusion-free-updates

@totaam
Copy link
Collaborator

totaam commented Jul 14, 2024

For completenes: these are the versions of the packages in which the problem is already

None of these libraries are used by xpra.
Can you try a version from the beta channel?

This type of visual corruption makes me think that perhaps there is some race condition between the application painting the screen and xpra capturing the screen updates from the X11 server.
How does the application paint the video on screen?

@deeptho
Copy link
Author

deeptho commented Jul 14, 2024

None of these libraries are used by xpra.

Good. That narrows it down.

Can you try a version from the beta channel?
I will try soon and let you know.

This type of visual corruption makes me think that perhaps there is some race condition between the application painting the screen and xpra capturing the screen updates from the X11 server.

How does the application paint the video on screen?

It is actually libmpv that does the drawing, through a callback in user code.
It is explained (somewhat) in the comments in
https://github.com/mpv-player/mpv/blob/master/libmpv/

Basically, when mpv has a new frame ready, it calls a callback functio.
That callback function wakes up a thread in my program (as it is illegal to draw from the callback)
The callback
0. calls wxMutexGuiEnter() from wxwidgets

  1. calls opengl's SetCurrent()
  2. cals mpv_render_context_render, an mpv function to do the actual rendering
  3. calls SetCurrent() again.
  4. renders an svg overlay (only when there is something to display)
  5. calls opengl's SwapBuffers()
  6. calls wxMutexGuiLeave()

Also worth noting: the opengl context is created per thread.

@totaam
Copy link
Collaborator

totaam commented Jul 14, 2024

OK, so you're drawing with OpenGL.
Do you use virtualgl at all?
If not then the vfb will be using the software renderer (usually llvmpipe) - this can be seen in the server log.

At this point, I think it's likely that something is going awry there, not in xpra. Especially when this type of visual artifacts has never been reported.

To verify this, can can dump the virtual screen contents at two points, before they are sent to the client:

  • --sync-xvfb=100 will cause all the screen contents to be painted on the vfb (normally they are only composited), so the you can use any X11 screengrab tool (ie: I use scrot) to see the contents of xpra's framebuffer
  • XPRA_SAVE_TO_FILE=1 xpra start ...:
    SAVE_TO_FILE = envbool("XPRA_SAVE_TO_FILE")
    def may_save_image(coding: str, data: memoryview | bytes | bytearray, now: float = 0):
    if SAVE_TO_FILE: # pragma: no cover
    now = now or monotonic()
    ext = coding.lower().replace("/", "-")
    filename = f"./{now}.{ext}"
    bdata = memoryview_to_bytes(data)
    with open(filename, "wb") as f:
    f.write(bdata)
    log.info("saved %7i bytes to %s", len(data), filename)

    this will save almost all screen updates to files in the current directory.
    It is helpful to specify an encoding when using this: --encodings=webp,png

If these images are also corrupted, then xpra is only compressing what it is receiving.
(and be aware that each one of these tweaks may cause the bug to go away - such is the nature of race conditions - it doesn't mean much in that case)

@deeptho
Copy link
Author

deeptho commented Jul 15, 2024

I am not using VirtualGL.
Thanks for the other suggestions.

I have tried a few things

  1. with the official non-beta version:
    I added sync-xvfb=100 to ~/.config/xpra/xpra.conf on the server and verified that the argument was indeed present
    after server was started. This made no difference

  2. XPRA_SAVE_TO_FILE=1 xpra start ssh://tuf.mynet/99 --dpi=157 --start='terminator -p tuf'
    on the client. THen selecting "webp" from the menu. The screenshots have the same corruption. This was with
    sync-xvfb=100 still on.

  3. Upgrade to the beta version on the server. xpra-6.1-10.r36230.fc40.x86_64
    This made no difference. I repeated test 2.

  4. neumodvb also has the option to create a screenshot (using wxiwdgets functionality) at the server side.
    Those screenshots are always fine, even when xpra shows a black picture.

  5. I also added some debug code to see if the framebuffer drawing code was actually called. This seemed normal.
    server.log

Additional information: server.log shows one error:
Error: unmanaged X11 context

Also: when I run neumodvb on the native Xserver and then use xpra shadow, also everything works fine.

The type of distortion (vertical lines, pixels, triangles) seems to be specific per channel (always the same
types present on one channel, although the details vary depending on window size, or just from trial
to trial.

So, following your analysis, I think it is safe to assume that the corruption happens before xpra starts processing
the data? I am not familiar enough with the internals to make the best analysis myself, but my
guess is the problem should be somehow related to some incorrect synchronisation between
writing data to the display buffer (mpv and my code) and reading from it (xpra).

From the point of view of wxwidgets, mpv, and my code everything seems to work fine, which does of course
not mean that it is really fine:
-displaying natively works
-screenshots work
-display natively and displaying native display also works (but I understood that uses a different mechanism)

From the point of view of xpra, also everything seems fine, except that it reads corrupted data.

How does xpra read the data from the screen in case some openGL is displayed on it?

Any ideas for further debugging? It would already be nice to be able to pinpoint where exactly the problem is.

Although probably unrelated: I also noticed some background colour flashes. These go away when I force
encoding to video stream, but in that case I see some screen corruption when repainting text areas on the left.

@totaam
Copy link
Collaborator

totaam commented Jul 15, 2024

This made no difference

It is not meant to.
It is meant to allow you to look at the vfb using a screenshot tool.

... on the client. THen selecting "webp" from the menu. The screenshots have the same corruption.

This was meant to be used on the server, not the client.
So that you can look at the screenshots before they are sent to the client.

Error: unmanaged X11 context

Please include the full details of the error

be somehow related to some incorrect synchronisation between writing data to the display buffer

Looks like it.
There is no synchronization. Xpra can capture the pixels at any time.

How does xpra read the data from the screen in case some openGL is displayed on it?

Same as non-opengl: using XShm:

if not XShmGetImage(self.display, drawable, self.image, 0, 0, 0xFFFFFFFF):

Although probably unrelated: I also noticed some background colour flashes.

These would likely be opengl glClear.

@deeptho
Copy link
Author

deeptho commented Jul 15, 2024

This was meant to be used on the server, not the client. So that you can look at the screenshots before they are sent to the client.

So how does one actually start the server command manually in this case?
Usually it is started when the client connects, but then no env variable can be set.

About the "unmanaged x11 context error": the server log was attached to my message above. You can check
the detailed report there.

@totaam
Copy link
Collaborator

totaam commented Jul 15, 2024

So how does one actually start the server command manually in this case?

Login to your server via ssh, then run XPRA_SAVE_TO_FILE=1 xpra start --sync-xvfb=100 --start=xterm

@deeptho
Copy link
Author

deeptho commented Jul 15, 2024

I don't think this properly works:

  1. on the sytem with the older libraries, if I run'
    XPRA_SAVE_TO_FILE=1 xpra start :99 --sync-xvfb=100 --start='terminator -p tuf'
    Then I can connect on the client. I get a properly running program, but nothing is saved on the server.

  2. on the system with the new libraries, when I start the neumodvb program, it starts with a completely black
    screen, but some paint operations appear when I press the arrow buttons (which cause screen paints).
    Again the save to file does not seem to work.

The server runs the expra debug version. The client the non-debug version.
I assume that the saved updates should be stored in the current working directory.

xpra info :99 |grep SAVE
env.XPRA_SAVE_TO_FILE=1

shows that the variable is set.

@totaam
Copy link
Collaborator

totaam commented Jul 16, 2024

but nothing is saved on the server.

It is, just not where you're looking, add --no-daemon and the server won't chdir so the files will appear in your cwd.

it starts with a completely black screen

Any errors in the server log / output or client output?

The server runs the expra debug version. The client the non-debug version.

There are no debug versions, do you mean beta?

@deeptho
Copy link
Author

deeptho commented Jul 16, 2024

but nothing is saved on the server.

It is, just not where you're looking, add --no-daemon and the server won't chdir so the files will appear in your cwd.

Ok. with --no-daemon the saved pictures appear. They show exactly what is appearing client side
on the screen: corruption in case client side showed corruption, good data when client side shows good data.
This time no "full black screen", but the older types of corruption. with the black lines.
One of the channels this time (on the 3rd trial) played without artefacts. The screenshots for that portion
of the session also looked good.

it starts with a completely black screen

Any errors in the server log / output or client output?

For the experiment of today, it seems the log is not written to server.log but to the terminal.
I have copied them to the attached file (after removing the many "save picture" lines)
There are a few errors, but some are probably unrelated.

The most noteworthy is the "unmanaged X11 context", which has a python stack trace.
I see things like

self.update_root_overlay(window, x, y, image)
"/usr/lib64/python3.12/site-packages/xpra/x11/gtk/composite.py", line 129, in do_x11_damage_event
 self.emit("contents-changed", event)

which could well be related to the problem.

Here is the log:
server.log

The server runs the expra debug version. The client the non-debug version.

There are no debug versions, do you mean beta?
Sorry, yes. I meant beta

@totaam
Copy link
Collaborator

totaam commented Jul 16, 2024

They show exactly what is appearing client side

So it looks like the X11 server is giving xpra these same pictures and the visual corruption is happening before xpra.

For the experiment of today, it seems the log is not written to server.log but to the terminal.

Yes, that's how it works.
As per the man page, only daemon mode uses the log file.

The most noteworthy is the "unmanaged X11 context", which has a python stack trace.

This is unrelated and fixed in eb50717
It only triggers when the --sync-xvfb switch is enabled.

@deeptho
Copy link
Author

deeptho commented Jul 16, 2024

So it looks like the X11 server is giving xpra these same pictures and the visual corruption is happening before xpra.
So why does the virtual X server being used (Xdummy) create problems while the regular one works fine?
Especially since the versions of the X software have not even chanegdbetween the working and non-working installation.

In summary
-the problem is triggered in libraries which paint the actual video
-as far as wxwidgets is concerned, all is fine as screenshots are fine
-as far as a regular Xserver is concerned all is fine as well
-but when xpra reads from Xdummy things are not fine

I will see if I can somehow downgrade mpv_lib to earlier versions by recompiling it. It seems
the most likely place to find what has changed.

Another strange thing is that I can get one of the TV channels to work perfectly after I try a few times to
resize the main window of neumodvb. After this succeeds, I can resize the window anyway I want and the video
remains perfect. So once it starts well, it is robust against resizing.

@totaam
Copy link
Collaborator

totaam commented Jul 17, 2024

the problem is triggered in libraries which paint the actual video

Perhaps.

as far as wxwidgets is concerned, all is fine as screenshots are fine

What screenshots? Did you use a screenshot tool with sync-xvfb?
(note that the vfb screen is then painted by xpra, so issues in xpra would also show up there)

If it was me I would:

  • try to run the app with virtualgl so that it uses a real GPU for opengl rendering
  • try to run on a real display (with xpra as window manager) so that the app can paint with a real GPU, without the virtualgl interposer

Both options are documented here:
https://github.com/Xpra-org/xpra/blob/master/docs/Usage/OpenGL.md

@deeptho
Copy link
Author

deeptho commented Jul 17, 2024

What screenshots
The ones made by neumodvb, through the "screenshot" functionality if mpv. So this is what is rendered
after decoding the video.

What screenshots? Did you use a screenshot tool with sync-xvfb? (note that the vfb screen is then painted by xpra, so issues in xpra would also show up there)

Yes. They just show the same corruption as what is displayed on screen.

If it was me I would:

* try to run the app with virtualgl so that it uses a real GPU for opengl rendering

I have never used virtualgl, and whatever I do needs to be compatible with mpv-libs. I would not even
know how to get started.

A quick inspection of Virtual-GL does not really explain how to use it. I also fail to see the benefits. All that is rendered is
video, i.e., images and this supposed to be what virtualgl also does.

* try to run on a real display (with xpra as window manager) so that the app can paint with a real GPU, without the virtualgl interposer

Well yes, I have done that many times before without any problems.
neumodvb runs perfectly well on a real display with intel GPU. It also runs perfectly well
when using "xpra shadow" on that real display. I understood that uses another mechanism for capturing
the display.

Both options are documented here: https://github.com/Xpra-org/xpra/blob/master/docs/Usage/OpenGL.md

@totaam
Copy link
Collaborator

totaam commented Jul 17, 2024

I would not even know how to get started.

From within the xpra session: vglrun yourapp.
(assuming that you have a real X11 server bound to your GPU running at :0)

Well yes, I have done that many times before without any problems.

With xpra as window manager? That's surprising as it is not easy to setup.

It also runs perfectly well when using "xpra shadow" on that real display

shadow is screenscraping, not compositing, so it may not have the same race conditions.

One more thing worth trying is to run your server with:

XPRA_XSHM=0 xpra start ..

This will make the server's pixel capture code much slower, it may have an effect on your visual corruption, one way or the other.

@deeptho
Copy link
Author

deeptho commented Jul 17, 2024

I would not even know how to get started.

From within the xpra session: vglrun yourapp. (assuming that you have a real X11 server bound to your GPU running at :0)

I wish to would explain that better on their main page.
In any case, there is no rpm, so I will need to compile that. Maybe later.'

Well yes, I have done that many times before without any problems.

With xpra as window manager? That's surprising as it is not easy to setup.

No a shadow display:
on the server (on the native display)
xpra
select shadow

on the client
xpra
select connect

It also runs perfectly well when using "xpra shadow" on that real display

shadow is screenscraping, not compositing, so it may not have the same race conditions.

Yes, that is what I thought.

One more thing worth trying is to run your server with:

XPRA_XSHM=0 xpra start ..

This will make the server's pixel capture code much slower, it may have an effect on your visual corruption, one way or the other.

I tested. Screen corruption is also there.

@deeptho
Copy link
Author

deeptho commented Jul 17, 2024

I made some further experiments, which involve ONLY selecting a different version of libmpv (everything else
the same including libplacebo and neumodvb. Note even a recompilation).

I have now found two versions that differ in exactly one line of code. The older one works without
artefacts, but xpra often fails to detect the video when it is played the first time in a window (afterwards it always works)
mpv-player/mpv@c172a65

All this commit does is changing a default scaler to another one:

vo_gpu: default to dscale=hermite
This new filter is slightly sharper, and significantly faster, than
mitchell. It also tends to preserve detail better. All in all, there is
no reason not to use it by default, especially from a performance PoV.
(In vo_gpu_next, hermite is implemented efficiently using hardware
accelerated bilinear interpolation)

`- {{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}},

  •    {{"hermite", .params={NAN, NAN}}, {.params = {NAN, NAN}},
    

`

I have tried reverting this patch on master, but then screen corruption still occurs.

In general I have found that the number of occurrences of screen corruption drops to quite low levels
for some versions of libmpv, which makes bisecting difficult. SOmetimes it took many attempts to get corruption.
In other versions it was easy to get the problem.

Of course it is possible that the one liner triggers some kind of memory corruption, but that seems unlikely.
More likely is that it functions correctly (see everything in this thread) but that there is some inter process synchronisation issue (neumodvb versus Xdummy, or Xdummy versus xpra) that reveals itself or not, depending on the amount of time some computations take. But this is not entirely random: a corrrupt stream remains corrupt forever
and the location of the corrupted areas does not change if the window is not resized.

Also, I have noticed with some versions of mpv (especially the newer ones) screen corruption in textual areas,
which are not even touched by libmpv.

That too points to "each program separately is working as it should but because of differences in when things are
painted, xpra reacts differently resulting in good/bad results depending on the details"

The fact that the corruption remains constant over time also points to decision that is made somewhere and
then is never changed until the video is stopped for some amount of time. If the decision is wrong there
is no or a corrupted video. If it is right all is fine

One last hypothesis is that there is something specific about framebuffer formats in XDummy versus
a real Xserver, or perhaps with the alignment of buffers.

@deeptho
Copy link
Author

deeptho commented Jul 17, 2024

I also found an example of an other program that shows screen corruption when run under xpra as seamless:
https://github.com/v0idv0id/MPVideoCube?tab=readme-ov-file

The code requires some minor adjustments to compile (remove the last nullpt in
replace mpv_terminate_destroy(mpv);
with mpv_terminate_destroy(mpv);
and remove the last nullptr in opengl_init_params

When running this demo multiple times
sh rundemo-video-1080p-5fps.sh
the result for the bad version of libmpv (with the one line change in interpolation) varies from run to run between

  1. no video shown on the cube: rare
  2. distorted video (moire patterns): frequent
  3. correct video: frequent

If I try the older mpv version, it seems to work always without distortion

totaam added a commit that referenced this issue Jul 18, 2024
@totaam
Copy link
Collaborator

totaam commented Jul 18, 2024

The older one works without artefacts, but xpra often fails to detect the video when it is played the first time in a window

How odd, perhaps this changes how the application paints the screen and reports it to the X11 server - you can see this with the server's -d damage switch.
Since this only changes a video filter, I think it could be one of two things:

  • this triggers a race condition (as the commit states that this new filter is "significantly faster") - like you said: depending on the amount of time some computations take
  • perhaps the GPU is involved in the rendering somehow - though it's not clear to me how GPU access would work from an Xvfb context (if libva then that's definitely a candidate for bugs and weird behaviour - try to turn it off to see if that helps)

One last hypothesis is that there is something specific about framebuffer formats in XDummy versus
a real Xserver, or perhaps with the alignment of buffers.

It certainly looks that way, but we get the buffers with the XShm calls I had linked to previously, these have been unchanged for ~12 years or so and have never caused problems with any libraries.
The rowstride is fixed and rounded up.

Another idea: do you have speaker forwarding enabled?
You can try with c4cc39b (just apply it by hand if needed) and setting XPRA_IMAGE_ALWAYS_FREEZE=1 on the server.
Maybe it will help.

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

How odd, perhaps this changes how the application paints the screen and reports it to the X11 server - you can see this with the server's -d damage switch. Since this only changes a video filter, I think it could be one of two things:

I did not try that yet, but I did try -d encoding and -d regiondetect
The main differences between corruption/no-corruption runs (always with the corrupion causing version)
wer in the scores (somewhat lower) and the size of regions.

* perhaps the GPU is involved in the rendering somehow - though it's not clear to me how GPU access would work from an Xvfb context (if `libva` then that's definitely a candidate for bugs and weird behaviour - try to turn it off to see if that helps)

It is almost certainly used by mpv-libs

It certainly looks that way, but we get the buffers with the XShm calls I had linked to previously, these have been unchanged for ~12 years or so and have never caused problems with any libraries. The rowstride is fixed and rounded up.

If the (dummy) server renders things differently now than 10 years ago (because of newer libraries) then XShm
also returns different things.

Another idea: do you have speaker forwarding enabled? You can try with c4cc39b (just apply it by hand if needed) and setting XPRA_IMAGE_ALWAYS_FREEZE=1 on the server. Maybe it will help.

Do you mean audio? Yes, using pipewire. It is worth a try.

@totaam
Copy link
Collaborator

totaam commented Jul 18, 2024

It is almost certainly used by mpv-libs

Again, I'm not sure if / how that accesses the GPU from an Xvfb context.

If the (dummy) server renders things differently now than 10 years ago

That's why I had suggested using VirtualGL - just in case the software gl stack is the problem.

Do you mean audio? Yes, using pipewire. It is worth a try.

No, xpra has its own audio forwarding switch, when enabled it will try to sync screen updates with the audio.
This makes xpra freeze (memory copy) the pixels when captured, so that it can accumulate frames before sending.
The new switch can force the freeze.

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

vglrun does not work
[VGL] /usr/lib64/VirtualGL/libvglfaker.so: undefined symbol: glXGetProcAddres

Here is a log file file obtained during corruption. The video showed a hige number of black regions
serverbad.log

and for comparison also a log file without distortion (exact same code).
servergood.log

So far I have not self compiled xpra, so the test of the patch will be for later

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

And here are some results with the videocube program.
First two runs with a bad result and some representative screenshots

bad
bad1
bad2
bad3

serverbad.log
serverbad2.log

Now another run shows a good result (although it is not entirely good: black rectangles appear
in the background, but the video displayed on the cubs is good)
good2
good
good1
servergood.log

You may wish to try this videocube example yourself to see if you can reproduce. It does require a recent
mpv-libs (ububtu is usually very outdated, fedora not)

@totaam
Copy link
Collaborator

totaam commented Jul 18, 2024

libvglfaker.so: undefined symbol: glXGetProcAddres

This could well be a broken vgl downstream package: VirtualGL/virtualgl#139

and for comparison also a log file without distortion

Nothing stands out.

So far I have not self compiled xpra, so the test of the patch will be for later

You don't need to compile from source, you can just patch your existing installation.

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

I managed to run it like this
VGL_GLLIB=/usr/lib64/libGL.so.1 vglrun ./neumodvb.py
same with videocube

So far I cannot reproduce screen corruption with this. Also the "channel remains black when
first playing video" problem is gone or much less frequent

attached are two (debug damage) log files

  1. without vglrun
    (tail -f /run/user/760/xpra/99/server.log > /tmp/server.log&) ;VGL_GLLIB=/usr/lib64/libGL.so.1 ./rundemo-video-1080p-5fps.sh
    serverbad.log

  2. with vglrun
    (tail -f /run/user/760/xpra/99/server.log > /tmp/server.log&) ;VGL_GLLIB=/usr/lib64/libGL.so.1 vglrun ./rundemo-video-1080p-5fps.sh
    servergood.log

@totaam
Copy link
Collaborator

totaam commented Jul 18, 2024

Right, so this is not an xpra bug but a problem with the software opengl renderer used with Xvfb / Xdummy.

You may want to switch to Xvfb instead of Xdummy, it may behave differently:

# Virtual display command:
# - Xvfb option (limited DPI support)
# xvfb = Xvfb -nolisten tcp -noreset \
# +extension GLX +extension Composite \
# +extension RANDR +extension RENDER \
# -auth $XAUTHORITY \
# -screen 0 8192x4096x24+32 \
# -dpi 96x96
# - Xephyr (requires a running X11 server):
# xvfb = Xephyr -nolisten tcp -noreset \
# +extension GLX +extension Composite \
# -auth $XAUTHORITY \
# -screen 8192x4096x24+32
# - Xdummy (better with version 0.4.0 or later):
#xvfb = %(xdummy_command)s
#
# Selecting virtual X server:
xvfb = %(xvfb_command)s

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

I just tried Xvfb. It is running as:
Xvfb-for-Xpra-99 -nolisten tcp -noreset +extension GLX +extension Composite -auth /home/user/.Xauthority -screen 0 8192x4096x24+32 :99

It also has screen corruption and "video not playing at start" problems
(but not with vgl_run)

xxx

	linux-vdso.so.1 (0x00007efd685af000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007efd6833a000)
	libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007efd67e8e000)
	libunwind.so.8 => /lib64/libunwind.so.8 (0x00007efd67e74000)
	libGL.so.1 => /lib64/libGL.so.1 (0x00007efd67e00000)
	libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007efd67d51000)
	libXfont2.so.2 => /lib64/libXfont2.so.2 (0x00007efd67d24000)
	libXau.so.6 => /lib64/libXau.so.6 (0x00007efd67d1c000)
	libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007efd67c2e000)
	libXdmcp.so.6 => /lib64/libXdmcp.so.6 (0x00007efd67c26000)
	libaudit.so.1 => /lib64/libaudit.so.1 (0x00007efd67bf4000)
	libm.so.6 => /lib64/libm.so.6 (0x00007efd67b10000)
	libc.so.6 => /lib64/libc.so.6 (0x00007efd6791f000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007efd6787d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007efd685b1000)
	libz.so.1 => /lib64/libz.so.1 (0x00007efd6785c000)
	libGLX.so.0 => /lib64/libGLX.so.0 (0x00007efd6782b000)
	libX11.so.6 => /lib64/libX11.so.6 (0x00007efd676e8000)
	libXext.so.6 => /lib64/libXext.so.6 (0x00007efd676d4000)
	libGLdispatch.so.0 => /lib64/libGLdispatch.so.0 (0x00007efd6765b000)
	libfontenc.so.1 => /lib64/libfontenc.so.1 (0x00007efd67650000)
	libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007efd67587000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007efd6757a000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007efd67559000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007efd67526000)
	libzstd.so.1 => /lib64/libzstd.so.1 (0x00007efd67465000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007efd67437000)
	libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007efd6742e000)
	libxcb.so.1 => /lib64/libxcb.so.1 (0x00007efd67403000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007efd673ef000)
	libpng16.so.16 => /lib64/libpng16.so.16 (0x00007efd673b3000)
	libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007efd6729d000)
	libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007efd6728f000)
	libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007efd67141000)
	libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007efd67121000)
	libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007efd670fc000)
ldd /usr/bin/Xdummy 
	not a dynamic executable

My guess is that xpra makes some wrong decision at an early stage of
the video starting and sticks with it. The common aspect between the "black video
at start" and the corruption is that parts of the screen are not updated or incorrectly
updated with black. This would explain why I also see temporary but weird effects
in the text region, where no video is displayed.

The wrong decision may be triggered by something external of course.

@totaam
Copy link
Collaborator

totaam commented Jul 18, 2024

That's some really weird visual artifacts!

(but not with vgl_run)

Again, the problem is very likely to be caused by the opengl software rendering.
Can you modify the Xdummy command and remove +extension GLX?
Then see how mpv renders the video?

My guess is that xpra makes some wrong decision at an early stage of the video starting and sticks with it.

Every frame is grabbed in exactly the same way, and each frame is independent of the ones before it.

BTW, I've just released xpra 6.1.0, it includes some the fixes discussed here.

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

Same corruption:

1

@deeptho
Copy link
Author

deeptho commented Jul 18, 2024

I have made some progress. I added some extra code to neumodvb.py which saves
the openGL current buffer every 100 frames, to see what is written there.
It turns out that when corruption happens the corruption is already present there.
So that means that mpv code is to blame.

Firs I should mention that I tested all of this with vglrun and I get good pictures like that.
Without vglrun, sometimes there is no corruption and I also get good screenshots.
But when corruption is hows on the screen I get corrupted images. which however are
not always shown the same corruption as on screen. I think this must be related to the
simplicity of the debug code saving the images AND with the current buffer being in some
unexpected format (because of mpv).

Here is an example:
bad

And here is another example, after I instruct neumodvb to paint an overlay. The overlay
is correct but the video is not
bad_with_osd

Now I have to figure out how to get some properties of the current opengl buffer to
better understand what is going on.

So my latest guess is that it must be some bug in mpv, triggered by something specific
in the virtual xservers. E.g., a specific visual or something like that.

@totaam
Copy link
Collaborator

totaam commented Jul 19, 2024

Quite pleased to see that the bug is not in xpra and that I had the correct suspicions 5 days ago: #4300 (comment)

I am closing this issue as "invalid" because there is no bug in xpra, but do keep us posted on the opengl issues with the software renderer - many users rely on it and it would be great to know what is going wrong with it!

@totaam totaam closed this as completed Jul 19, 2024
@deeptho
Copy link
Author

deeptho commented Jul 19, 2024

virtualgl was a good recommendation after all. Hopefully there will be some progress on the
mpv side.

@totaam
Copy link
Collaborator

totaam commented Jul 27, 2024

The ticket tracking scroll corruption is #4201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants