Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decompression error in rgb delta #866

Closed
totaam opened this issue May 21, 2015 · 19 comments
Closed

decompression error in rgb delta #866

totaam opened this issue May 21, 2015 · 19 comments

Comments

@totaam
Copy link
Collaborator

totaam commented May 21, 2015

Issue migrated from trac ticket # 866

component: core | priority: blocker | resolution: fixed

2015-05-21 06:26:00: antoine created the issue


Triggered using the command lines from [/ticket/863#comment:8]:

  • server:
XPRA_MAX_DELTA_HITS=2 xpra start :10 --start-child=xterm --encodings=rgb
  • client:
xpra attach --no-mmap -z 0

Then just press enter in the xterm until the screen floods with:

error processing draw packet
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2002, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2048, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 423, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 473, in draw_region
    self.paint_rgb24(img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 264, in paint_rgb24
    rgb24_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 170, in process_delta
    img_data = compression.decompress_by_name(raw_data, algo=comp[0])
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 248, in decompress_by_name
    return decompress(data, flag)
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 227, in decompress
    return LZ4_uncompress(data)
ValueError: corrupt input at byte 532

I am also seeing some visual corruption, very similar to [/ticket/465#comment:17] (which is also about memoryview):
[[Image(xterm-xor-rgb-lz4-corruption.png)]]
Maybe we need to freeze() or copy the pixel buffer?

The errors only occur when the scrolling triggers a full window refresh with rgb + lz4, smaller screen updates with rgb + lz4 go through without problem:

Maybe we should honour the -z 0 flag here and not compress the rgb pixels at all? (the z flag was for packet compression originally)

Also, the error is sent back to the server which then does a refresh, but this all happens to quickly and we end up in a tight loop. Probably worth adding a delay when refreshing after an error.

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 06:47:49: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 06:47:49: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 06:51:43: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 06:52:09: antoine uploaded file xterm-xor-rgb-lz4-corruption.png (5.0 KiB)

shows the visual corruption
xterm-xor-rgb-lz4-corruption.png

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 06:52:51: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 14:52:29: antoine uploaded file integrity-hash-v2.patch (4.9 KiB)

adds the ability to verify that the data we get before and after decompression matches what the server meant to send

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 17:58:37: antoine commented


The visual corruption is fixed in r9469, backport to 0.15 in 9470.
(the previous fix r9449 was incomplete)

The "corrupt input" remains..

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 18:08:42: antoine commented


WTH?

  • using compressors=zlib, no problem
  • using compressors=lzo, no problem.
  • using compressors=lz4 gives the ValueError: corrupt input at byte 532

Very very odd.

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2015

2015-05-21 18:17:59: antoine uploaded file integrity-hash-v3.patch (3.4 KiB)

also verifies that we can decompress the lz4 data without errors

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 06:31:40: antoine commented


Also interesting: in [/attachment/ticket/869/ticket869_osx-client-logs.txt] (duplicate ticket), we can see that the corrupt byte is at a different place (also always the same):

ValueError: corrupt input at byte 14

And apparently the server segfaults.
All points towards a memory corruption issue. #465 and zero copy memoryview me thinks.

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 12:33:14: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 12:33:14: antoine changed owner from antoine to afarr

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 12:33:14: antoine commented


Fixed in r9480, see commit message.

I wished I had time to investigate why this only affected lz4. Which thread does the decompression should not matter! (network thread or decode thread)

All this time was not all wasted though, we now have:

  • r9476: optional data integrity checks via XPRA_INTEGRITY_HASH=1
  • r9477: logging improvements
  • r9478 + r9479: less unnecessary copying of pixel data with lzo and lz4 (which I dare not backport at this point)

@afarr: please confirm and close, using the instructions in the ticket description, and also the same steps as #869 (that max reported - he can help you with testing).
Please also confirm that none of this affected the 0.15 branch... which is the one branch we should be focusing on.

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 16:28:16: antoine commented


Changed my mind: found too many issues related to buffers vs memoryview (#465), so 9490 backports a number of important fixes to v0.15.x (including r9478 + r9479 and more).

It is worth double-checking:

  • the proxy server, with nvenc
  • nvenc standalone
    etc..

@totaam
Copy link
Collaborator Author

totaam commented May 28, 2015

2015-05-28 02:08:07: afarr commented


Well, testing with 0.15.0 9470 windows client against 0.15.0 9470 fedora 20 server, with a firefox video playing on youtube, I was seeing a similar error that didn't seem to be lz4 specific:

2015-05-27 16:43:49,871 error processing draw packet
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 1975, in _draw_thread_loop
  File "xpra\client\ui_client_base.pyc", line 2021, in _do_draw
  File "xpra\client\client_window_base.pyc", line 423, in draw_region
  File "xpra\client\window_backing_base.pyc", line 473, in draw_region
  File "xpra\client\window_backing_base.pyc", line 264, in paint_rgb24
  File "xpra\client\window_backing_base.pyc", line 171, in process_delta
  File "xpra\net\compression.pyc", line 242, in decompress_by_name
  File "xpra\net\compression.pyc", line 221, in decompress
ValueError: invalid size in header: 0xff000000

Testing the same site & video with win32 client 0.15.0 r9533 against fedora 20 0.15.0 r9533 server, however... there is no sign of this error.

Oddly though, and I'm not sure if this is related to these backports, when I tested on osx (also 0.15.0 r9533) - XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them. (Will post this and the relatively unhelpful server-side logs in #760.)

@totaam
Copy link
Collaborator Author

totaam commented May 28, 2015

2015-05-28 04:54:20: antoine commented


ValueError: invalid size in header: 0xff000000
I am pretty sure that this error was fixed 6 days ago in r9480 (9490 for older branches).

Unless you can reproduce this with an up to date build, I think we can close this.

[[BR]]

XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them
[[BR]]
Let's follow that up in #760#comment:23

@totaam
Copy link
Collaborator Author

totaam commented Jun 1, 2015

2015-06-01 23:36:47: afarr changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Jun 1, 2015

2015-06-01 23:36:47: afarr set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented Jun 1, 2015

2015-06-01 23:36:47: afarr commented


Testing with the win32 0.15.0 r9533 against a fedora 20 0.15.0 r9533 server... no sign of that draw_thread_loop error - niehter from playing video nor from a lot of returns in an xterm with --encodings=rgb.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant