Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xor error with differing pixel format #861

Closed
totaam opened this issue May 15, 2015 · 21 comments
Closed

xor error with differing pixel format #861

totaam opened this issue May 15, 2015 · 21 comments

Comments

@totaam
Copy link
Collaborator

totaam commented May 15, 2015

Issue migrated from trac ticket # 861

component: encodings | priority: blocker | resolution: fixed

2015-05-15 04:46:04: antoine created the issue


Testing 0.15 through the proxy server, got this error:

error processing draw packet
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 1975, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2021, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 423, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 473, in draw_region
    self.paint_rgb24(img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 264, in paint_rgb24
    rgb24_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 186, in process_delta
    rgb_data = xor_str(img_data, ldata)
  File "xpra/codecs/xor/cyxor.pyx", line 19, in xpra.codecs.xor.cyxor.xor_str (xpra/codecs/xor/cyxor.c:736)
    assert len(buf)==len(xor), "cyxor cannot xor strings of different lengths (%s:%s vs %s:%s)" % (type(buf), len(buf), type(xor), len(xor))
AssertionError: cyxor cannot xor strings of different lengths (<type 'str'>:38192 vs <type 'str'>:28644)

Which is what you would expect if you were xoring RGBX with RGB (25% bigger).
I don't see how that is possible because we verify the pixel format AND the size of the buffer before we xor on the server side...

@totaam
Copy link
Collaborator Author

totaam commented May 18, 2015

2015-05-18 06:41:49: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented May 18, 2015

2015-05-18 06:41:49: antoine commented


Although I cannot reproduce this right now, I suspect that I hit this issue whilst testing through the proxy server, which does things with "draw" packets.
If I cannot find the bug, we can always disable delta buckets (set it to 0) when going through the proxy.

@totaam
Copy link
Collaborator Author

totaam commented May 18, 2015

2015-05-18 12:20:23: antoine changed priority from critical to minor

@totaam
Copy link
Collaborator Author

totaam commented May 18, 2015

2015-05-18 12:20:23: antoine commented


Managed to reproduce, at least something similar...
with:

  • server started with:
xpra start :10 --bind-tcp=0.0.0.0:10000 --start-child=xterm
  • proxy server started with:
XPRA_PROXY_PASSTHROUGH=1 xpra proxy :20 --bind-tcp=0.0.0.0:10001 --auth=allow -d encoding
  • client started with:
xpra attach tcp:192.168.42.100:10001 --no-mmap --password-file=./password.txt

Notes:

  • r9434 makes it easier to debug encoding related issues with the proxy server (adds -d encoding support)
  • r9399 adds XPRA_PROXY_PASSTHROUGH to make it easier to trigger and debug the passthrough code. Which normally only fires when the video context creation fails and PIL is missing... (I believe I was hitting the first condition because of r9413, not sure about the second!?)

The fix for this bug is in r9435, which also makes the code more generic: we no longer assume the pixels format is BGRX or RGBX format (with X aka alpha at the end).
Backported to v0.14.x and v0.15.x in 9438.

I am keeping this ticket open because I am not convinced that the "25% bigger" in the ticket description is necessarily the same bug.
In any case, we should also not assume that "rgb32" is available in all clients.
But since I can no longer trigger any proxy encoding bugs, I am also lowering the priority.

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 07:18:07: antoine changed priority from minor to blocker

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 07:18:07: antoine commented


I have hit this again, without the proxy... raising.

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 09:21:52: antoine uploaded file strict-delta-coding.patch (2.1 KiB)

ensures we never mix encodings with delta - which could make us mix pixel formats on the receiving end

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 09:43:04: antoine changed status from assigned to closed

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 09:43:04: antoine set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2015

2015-05-19 09:43:04: antoine commented


Maybe r9238 is wrong because maybe we do care what encoding is used. Different encodings use different decoders, and some might give us RGB, others RGBX... for the same input pixel format. Which could explain the differing sizes.
So r9448 makes the checks more strict. (I couldn't get it to trigger during testing, but we should not be making assumptions like these)

But that's not all, setting XPRA_MAX_DELTA_HITS low enough does trigger the problem.
The fix is obvious and embarrassing: see r9449.

Backports in 9450.

@totaam
Copy link
Collaborator Author

totaam commented May 20, 2015

2015-05-20 10:32:46: antoine changed status from closed to reopened

@totaam
Copy link
Collaborator Author

totaam commented May 20, 2015

2015-05-20 10:32:46: antoine removed resolution (was fixed)

@totaam
Copy link
Collaborator Author

totaam commented May 20, 2015

2015-05-20 10:32:46: antoine commented


Seen the bug again.

Stacktraces always seem to come in pairs. Could be related to the window edges?

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 16:15:43: antoine commented


More fixes for multi delta and the expiry code, see #866.

Those should be different from the "25% bigger" bug reported here though...
If it does occur again, we can also enable XPRA_INTEGRITY_HASH=1 as per #866#comment:7 (maybe the debug output will help)

@totaam
Copy link
Collaborator Author

totaam commented Jun 17, 2015

2015-06-17 17:26:06: antoine commented


Maybe only occurring with opengl=on and rgb24 as per #891#comment:10. If that's true, I have no idea why.

@totaam
Copy link
Collaborator Author

totaam commented Jul 17, 2015

2015-07-17 14:17:40: totaam commented


Found the problem, we sometimes call rgb_reformat to save space (from say BGRX to RGB - which is 25% smaller!), and the client may not need to convert it back: with opengl we can just upload RGB to the gpu.
But for xoring, we need the same format as the original...
This is not new to 0.15, but it just makes it easier to hit.

Fix on its way, but if it's too big the backport may more brutal: maybe just disable delta altogether!

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2015

2015-07-18 08:16:18: totaam commented


Now I know:

  • why this only affects OR windows like menus: we send the pixels for those windows before the client maps the window, and therefore before we know the full list of pixel modes which are going to be available - and so we use conservative values.. like RGB - by the time the second window update comes along, we have the full list, and we may send the pixels in a different format
  • this only affects opengl=on, because the non opengl backends don't support BGRX directly

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2015

2015-07-18 14:47:07: totaam commented


Fixed in r9970: we just don't bother with delta if the pixel format is going to require reformatting as it's too difficult to guarantee that the client will be able to reconstruct the same buffer format at the other end in that case. (we often drop the unused 'X' part for example).
Backports to v0.14.x and v0.15.x in 9971.

I still want to investigate why those errors didn't trigger a paint refresh. All paint errors should at worst cause a log message and slower repaint, not persistent visual corruption.

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2015

2015-07-18 17:53:24: totaam changed status from reopened to closed

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2015

2015-07-18 17:53:24: totaam set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2015

2015-07-18 17:53:24: totaam commented


r9978 ensures we refresh the whole window if we ever get client-side decoding errors (previously this only worked for video modes) - this may be worth applying to 0.15 at some point.

I think this ticket is done, feel free to re-open if you ever hit it again.
Note: as of r9976, trunk is likely to fail with a more precise error message which will look like this: delta region uses XXXX format, was expecting XXXX (where the XXXXs are usually BGRX and RGB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant