Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup_pipeline fail produced a traceback (0.14.19) #806

Closed
totaam opened this issue Feb 12, 2015 · 18 comments
Closed

setup_pipeline fail produced a traceback (0.14.19) #806

totaam opened this issue Feb 12, 2015 · 18 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Feb 12, 2015

Issue migrated from trac ticket # 806

component: server | priority: critical | resolution: invalid

2015-02-12 00:58:20: afarr created the issue


Using an osx 0.14.19 client (no idea what revision) against a fedora 21 0.14.19 (again, no idea what revision)... our builds.

I was streaming video from some site or other (cnn? huffingtonpost?) in one tab of a browser with other stuff in others, only half paying attention... when the video stream died.

I found the following traceback on the server side:

setup_pipeline failed for (44, (1, 1), (1, 1), 1118, 952, codec_spec(swscale), ‘YUV444P’, (1, 1), 1118, 952, codec_spec(x264))
Traceback (most recent call last):
File “/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py”, line 1100 in setup_pipeline
self._video_encoder.init_context(enc_width, enc_height, enc_in_format, dst_formats, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options)
File “xpra/codecs/enc_x264/encoder.pyx”, line 304, in xpra.codecs.enc_x264.encoder.Encoder.init_context (xpra/codecs/enc_x264/encoder.c:2712)
File “xpra/codecs/enc_x264/encoder.pyx”, line 327, in xpra.codecs.enc_x264.encoder.Encoder.init_encoder (xpra/codecs/enc_x264/encoder.c:2942)
AssertionError: context initialization failed for format YUV444P

Not sure if this is something obvious or something that I need to investigate further... hints and tactics are always welcome.

@totaam
Copy link
Collaborator Author

totaam commented Feb 12, 2015

2015-02-12 01:44:12: totaam changed owner from antoine to afarr

@totaam
Copy link
Collaborator Author

totaam commented Feb 12, 2015

2015-02-12 01:44:12: totaam commented


I have never, ever seen this one.
Can you reproduce it?
Get xpra info at the time of the problem?
Was the machine low on memory or something?

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 00:01:04: afarr commented


Ok... took a while to reproduce, and even longer to reproduce again with some logs running on the server(-d encoding seemed like a good guess - the control channel ability to enable logging on the fly was absolutely essential, thanks for that).

Tried to get an xpra info at the time, but the server was no longer responsive by the time I realized I'd succeeded in my repro (but I did get info from beginning of trial, which I'll attach).

Was using a windows client equivalent of 0.14.20, and fedora 21 server equivalent of 0.14.20 (exact revision uncertain).

I reproduced it on pandora.com, but was only able to do so with our custom webkit browser, not firefox.

I'll attach a relatively sane excerpt from the -d encoding server logs, but here's an interesting portion, note that there are webkit logging lines interspersed amidst the xpra server log bits:

Feb 20 14:58:56 2015-02-20 14:58:56,364 send_delayed for wid 3, batch delay is 11.2, elapsed time is 24.0 ms
Feb 20 14:58:56 2015-02-20 14:58:56,372 process_damage_regions: wid=3, adding h264 pixel data to queue, elapsed time: 33.2 ms, request rgb time: 3.1 ms
Feb 20 14:58:56 2015-02-20 14:58:56,373 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1096, 1036), damage data: (3, 0, 0, 1096, 1036, 'h264')
Feb 20 14:58:56 2015-02-20 14:58:56,374 video_encode('h264', XShmImageWrapper(BGRX: 0, 0, 1096, 1036), {})
Feb 20 14:58:56 2015-02-20 14:58:56,375 setup_pipeline([(44, (1, 1), (1, 1), 1096, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1096, 1036, codec_spec(x264))       , (43, (1, 1), (1, 1), 1096, 1036, codec_spec(swscale), 'YUV422P', (1, 1), 1096, 1036, codec_spec(x264)), (42, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 1096, 1036, codec_spec(x264)), (42,        (1, 1), (1, 1), 1096, 1036, codec_spec(swscale), 'YUV420P', (1, 1), 1096, 1036, codec_spec(x264)), (38, (1, 1), (1, 1), 1096, 1036, codec_spec(swscale), 'RGB', (1, 1), 1096, 1036, codec_s       pec(x264)), (26, (1, 1), (1, 1), 1096, 1036, codec_spec(cython), 'YUV420P', (1, 1), 1096, 1036, codec_spec(x264))], 1096, 1036, 'BGRX')
Feb 20 14:58:56 2015-02-20 14:58:56,378 setup_pipeline: trying (44, (1, 1), (1, 1), 1096, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1096, 1036, codec_spec       (x264))
Feb 20 14:58:56 2015-02-20 14:58:56,380 setup_pipeline: csc=swscale(BGRX 1096x1036 - YUV444P 1096x1036), info={'src_height': 1036, 'src_width': 1096, 'dst_hei       ght': 1036, 'dst_width': 1096, 'version': (3, 0, 100), 'flags': ['SWS_FAST_BILINEAR'], 'src_format': 'BGRX', 'formats': ['RGB', 'BGR', 'XRGB', 'BGRX', 'YUV420P', 'YUV422P', 'YUV444P', 'GBR       P'], 'frames': 0L, 'dst_format': 'YUV444P'}, setup took 1.24ms
Feb 20 14:58:56 2015-02-20 14:58:56,389 setup_pipeline: video encoder=x264_encoder(YUV444P - 1096x1036), info: {'profile': 'high444', 'speed': 64, 'height': 1       036, 'width': 1096, 'version': 142, 'src_format': 'YUV444P', 'formats': ['YUV422P', 'RGB', 'BGRX', 'BGR', 'YUV420P', 'BGRA', 'YUV444P'], 'frames': 0L, 'quality': 94, 'preset': 'faster'}, setup took 8.11ms
Feb 20 14:58:56 2015-02-20 14:58:56,392 video_encode('h264', XShmImageWrapper(BGRX: 0, 0, 1096, 1036), {}) image size: 1096x1036, encoder/csc size: 1096x1036
Feb 20 14:58:56 2015-02-20 14:58:56,391 send_delayed_regions: bytes_cost=185514, bytes_threshold=799999, pixel_count=184490
Feb 20 14:58:56 2015-02-20 14:58:56,394 process_damage_regions: wid=3, adding webp pixel data to queue, elapsed time: 55.7 ms, request rgb time: 0.7 ms
Feb 20 14:58:56 2015-02-20 14:58:56,430 csc_image(XShmImageWrapper(BGRX: 0, 0, 1096, 1036), 1096, 1036) converted to <class 'xpra.codecs.csc_swscale.colorspace_converter.CSCImageWrapper'>(YUV444P:(0, 0, 1096, 1036, 24):3_PLANES) in 36.3ms (29.9 MPixels/s)
Feb 20 14:58:56 2015-02-20 14:58:56,501 video_encode encoder: h264 1096x1036 result is 16632 bytes (15.8 MPixels/s), client options={'quality': 94, 'frame': 0L, 'pts': 0, 'csc': 'YUV444P', 'speed': 64}
Feb 20 14:58:56 2015-02-20 14:58:56,503 make_data_packet: image=XShmImageWrapper(BGRX: 1096, 64, 190, 971), damage data: (3, 1096, 64, 190, 971, 'webp')
Feb 20 14:58:56 2015-02-20 14:58:56,583 packet decoding sequence 5843 for window 3 1096x1036 took 32.0ms
Feb 20 14:58:56 2015-02-20 14:58:56,599 packet decoding sequence 5844 for window 3 190x971 took 15.0ms
Feb 20 14:58:56 2015-02-20 14:58:56,660 send_delayed_regions: bytes_cost=1136480, bytes_threshold=799999, pixel_count=1135456
Feb 20 14:58:56 2015-02-20 14:58:56,665 send_delayed_regions: using full window update 1287x1036 with png
Feb 20 14:58:56 2015-02-20 14:58:56,668 process_damage_regions: wid=3, adding png pixel data to queue, elapsed time: 8.3 ms, request rgb time: 2.8 ms
Feb 20 14:58:56 2015-02-20 14:58:56,669 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1287, 1036), damage data: (3, 0, 0, 1287, 1036, 'png')
Feb 20 14:58:56 2015-02-20 14:58:56,781 send_delayed_regions: bytes_cost=185514, bytes_threshold=799999, pixel_count=184490
Feb 20 14:58:56 2015-02-20 14:58:56,787 process_damage_regions: wid=3, adding webp pixel data to queue, elapsed time: 5.8 ms, request rgb time: 3.7 ms
Feb 20 14:58:56 2015-02-20 14:58:56,817 update_quality() info={'backlog_factor': 100, 'latency_factor': 435, 'min_quality': 30, 'batch_factor': 87}, quality=91
Feb 20 14:58:56 2015-02-20 14:58:56,821 update_speed() info={'damage_latency.abs_factor': 7, 'damage_latency.avg': 52, 'decoding_latency.target': 8000000, 'damage_latency.target': 41, 'min_speed': 0, 'frame_delay': 17, 'decoding_latency.factor': 25, 'damage_latency.ref': 41, 'damage_latency.rel_factor': 83, 'low_limit': 1333332}, speed=73
Feb 20 14:58:56 2015-02-20 14:58:56,822 update_encoding_options(False) want_alpha=False, lossless threshold: 89 / 20, small_as_rgb=85729, get_best_encoding=<bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(3 : (1287, 1036))>
Feb 20 14:58:56 2015-02-20 14:58:56,822 update_encoding_options(False) csc_encoder=swscale(BGRX 1096x1036 - YUV444P 1096x1036), video_encoder=x264_encoder(YUV       444P - 1096x1036)
Feb 20 14:58:56 2015-02-20 14:58:56,828 check_pipeline_score(False) best=(45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1286, 1036,        codec_spec(x264))
Feb 20 14:58:56 2015-02-20 14:58:56,829 check_pipeline_score(False) found better csc encoder: (45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P',        (1, 1), 1286, 1036, codec_spec(x264))
Feb 20 14:58:56 2015-02-20 14:58:56,868 check_pipeline_score(False) found better video encoder: (45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P       ', (1, 1), 1286, 1036, codec_spec(x264))
Feb 20 14:58:56 2015-02-20 14:58:56,873 setup_pipeline([(45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1286, 1036, codec_spec(x264)), (43, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 1286, 1036, codec_spec(x264)), (43, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV422P', (1, 1), 1286, 1036, codec_spec(x264)), (42, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV420P', (1, 1), 1286, 1036, codec_spec(x264)), (38, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'RGB', (1, 1), 1286, 1036, codec_s       pec(x264)), (26, (1, 1), (1, 1), 1286, 1036, codec_spec(cython), 'YUV420P', (1, 1), 1286, 1036, codec_spec(x264))], 1287, 1036, 'BGRX')
Feb 20 14:58:56 2015-02-20 14:58:56,874 setup_pipeline: trying (45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1286, 1036, codec_spec       (x264))
Feb 20 14:58:56 2015-02-20 14:58:56,891 setup_pipeline: csc=swscale(BGRX 1286x1036 - YUV444P 1286x1036), info={'src_height': 1036, 'src_width': 1286, 'dst_hei       ght': 1036, 'dst_width': 1286, 'version': (3, 0, 100), 'flags': ['SWS_FAST_BILINEAR'], 'src_format': 'BGRX', 'formats': ['RGB', 'BGR', 'XRGB', 'BGRX', 'YUV420P', 'YUV422P', 'YUV444P', 'GBR       P'], 'frames': 0L, 'dst_format': 'YUV444P'}, setup took 15.34ms
Feb 20 14:58:56 2015-02-20 14:58:56,949 setup_pipeline failed for (45, (1, 1), (1, 1), 1286, 1036, codec_spec(swscale), 'YUV444P', (1, 1), 1286, 1036, codec_spec(x264))
Feb 20 14:58:56 Traceback (most recent call last):
Feb 20 14:58:57 File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1100, in setup_pipeline
Feb 20 14:58:57 self._video_encoder.init_context(enc_width, enc_height, enc_in_format, dst_formats, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options)
Feb 20 14:58:57 File "xpra/codecs/enc_x264/encoder.pyx", line 304, in xpra.codecs.enc_x264.encoder.Encoder.init_context (xpra/codecs/enc_x264/encoder.c:2717)
Feb 20 14:58:57 File "xpra/codecs/enc_x264/encoder.pyx", line 327, in xpra.codecs.enc_x264.encoder.Encoder.init_encoder (xpra/codecs/enc_x264/encoder.c:2947)
Feb 20 14:58:57 AssertionError: context initialization failed for format YUV444P
Feb 20 14:58:57 2015-02-20 14:58:56,976 sending 1287x1036 BGRX as png, mode=RGB, options={'compress_level': 3}
Feb 20 14:58:57 2015-02-20 14:58:56,993 make_data_packet: image=XShmImageWrapper(BGRX: 1096, 64, 190, 971), damage data: (3, 1096, 64, 190, 971, 'webp')
Feb 20 14:58:57 2015-02-20 14:58:57,062 packet decoding sequence 5845 for window 3 1287x1036 took 62.0ms
Feb 20 14:58:57 2015-02-20 14:58:57,067 packet decoding sequence 5846 for window 3 190x971 took 0.0ms
Feb 20 14:58:57 2015-02-20 14:58:57,221 damage(WindowModel(0xe0000a - "Adobe Flash Player Download | AirGap"), 1, 64, 1285, 971, {})
Feb 20 14:58:57 2015-02-20 14:58:57,225 damage(1, 64, 1285, 971, {}) wid=3, scheduling batching expiry for sequence 5847 in 11.0 ms
Feb 20 14:58:57 2015-02-20 14:58:57,238 send_delayed for wid 3, batch delay is 11.7, elapsed time is 16.0 ms
Feb 20 14:58:57 2015-02-20 14:58:57,252 process_damage_regions: wid=3, adding h264 pixel data to queue, elapsed time: 31.1 ms, request rgb time: 2.6 ms
Feb 20 14:58:57 2015-02-20 14:58:57,254 send_delayed_regions: bytes_cost=185514, bytes_threshold=799999, pixel_count=184490
Feb 20 14:58:57 2015-02-20 14:58:57,254 make_data_packet: image=XShmImageWrapper(BGRX: 0, 0, 1096, 1036), damage data: (3, 0, 0, 1096, 1036, 'h264')
Feb 20 14:58:57 2015-02-20 14:58:57,255 video_encode('h264', XShmImageWrapper(BGRX: 0, 0, 1096, 1036), {})
Feb 20 14:58:57 2015-02-20 14:58:57,255 process_damage_regions: wid=3, adding webp pixel data to queue, elapsed time: 34.3 ms, request rgb time: 0.1 ms
Feb 20 14:58:58 2015-02-20 14:58:58,263 damage(WindowModel(0xe0000a - "Adobe Flash Player Download | AirGap"), 1, 64, 1285, 971, {})
Feb 20 14:58:58 2015-02-20 14:58:58,267 damage(1, 64, 1285, 971, {}) wid=3, scheduling batching expiry for sequence 5849 in 11.0 ms
Feb 20 14:58:58 2015-02-20 14:58:58,280 send_delayed for wid 3, batch delay is 11.7, elapsed time is 15.0 ms
Feb 20 14:58:58 2015-02-20 14:58:58,288 process_damage_regions: wid=3, adding h264 pixel data to queue, elapsed time: 24.3 ms, request rgb time: 2.7 ms
Feb 20 14:58:58 2015-02-20 14:58:58,290 send_delayed_regions: bytes_cost=185514, bytes_threshold=799999, pixel_count=184490
Feb 20 14:58:58 2015-02-20 14:58:58,292 process_damage_regions: wid=3, adding webp pixel data to queue, elapsed time: 28.5 ms, request rgb time: 1.6 ms
Feb 20 14:59:00 2015-02-20 14:59:00,100 damage(WindowModel(0xe0000a - "Adobe Flash Player Download | AirGap"), 1, 35, 1285, 1000, {})
Feb 20 14:59:00 2015-02-20 14:59:00,123 damage(1, 35, 1285, 1000, {}) wid=3, scheduling batching expiry for sequence 5851 in 11.0 ms
Feb 20 14:59:00 2015-02-20 14:59:00,152 send_delayed for wid 3, delaying again because too many pixels are waiting to be encoded: 1333332
Feb 20 14:59:00 2015-02-20 14:59:00,164 send_delayed for wid 3, delaying again because too many pixels are waiting to be encoded: 1333332
Feb 20 14:59:00 2015-02-20 14:59:00,173 process_damage_regions: wid=3, adding h264 pixel data to queue, elapsed time: 73.2 ms, request rgb time: 3.6 ms
Feb 20 14:59:00 2015-02-20 14:59:00,177 send_delayed_regions: bytes_cost=191024, bytes_threshold=799999, pixel_count=190000
Feb 20 14:59:00 2015-02-20 14:59:00,189 process_damage_regions: wid=3, adding webp pixel data to queue, elapsed time: 89.2 ms, request rgb time: 8.6 ms
Feb 20 14:59:00 2015-02-20 14:59:00,198 window 3 delayed region already sent
Feb 20 14:59:00 2015-02-20 14:59:00,565 damage(WindowModel(0xe0000a - "Adobe Flash Player Download | AirGap"), 1, 64, 1285, 971, {})
Feb 20 14:59:00 2015-02-20 14:59:00,576 damage(1, 64, 1285, 971, {}) wid=3, scheduling batching expiry for sequence 5853 in 12.0 ms
Feb 20 14:59:00 2015-02-20 14:59:00,593 send_delayed for wid 3, delaying again because too many pixels are waiting to be encoded: 1333332
Feb 20 14:59:00 2015-02-20 14:59:00,605 send_delayed for wid 3, delaying again because too many pixels are waiting to be encoded: 1333332

{lots more repetitions of the delay}

Feb 20 14:59:28 2015-02-20 14:59:28,137 delayed_region_timeout: region is 15038ms old, bad connection?
Feb 20 14:59:28 2015-02-20 14:59:28,138 cancel_damage() wid=3, dropping delayed region (1424473153.097817, WindowModel(0xe0000a - "Pandora Internet Radio - Listen to Free Music You'll Love | AirGap"), [R(0, 0, 1287, 1036)], 'h264', {}) and all sequences up to 5859
Feb 20 15:00:42 (WebKitPluginProcess2:3461): Gdk-WARNING **: XID collision, trouble ahead

After the error, the entire browser application became completely non-responsive and, as I said, it was impossible to get any response for server when looking for info.

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 00:01:48: afarr uploaded file ticket806_info-before.txt (96.1 KiB)

ticket 806 xpra info, beginning of session

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 00:02:20: afarr uploaded file ticket806_-d_encoding_log-excerpt-cleaned.txt (104.0 KiB)

ticket 806 server logs -d encoding

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 05:19:47: antoine commented


OK, TIL:

File "xpra/codecs/enc_x264/encoder.pyx", line 327, in xpra.codecs.enc_x264.encoder.Encoder.init_encoder (xpra/codecs/enc_x264/encoder.c:2947)
AssertionError: context initialization failed for format YUV444P

So I am still thinking that this is a resource leak somewhere (threads? memory?) which is causing x264 to fail.
r8688 adds the ability to show x264's own logging in with our logging.
Use it like so:

XPRA_X264_LOGGING=info xpra start ...

(other possible values are warning and error - debug is disabled as it causes server crashes...)

With this enabled I can see more info whenever we enter the x264 code, ie:

2015-02-21 12:10:52,256 X264: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX XOP FMA4 LZCNT
2015-02-21 12:10:52,260 X264: profile High, level 1.0
2015-02-21 12:10:52,260 X264: frame I:1     Avg QP:20.00  size:   666
2015-02-21 12:10:52,260 X264: mb I  I16..4: 100.0%  0.0%  0.0%
2015-02-21 12:10:52,260 X264: 8x8 transform intra:0.0%
2015-02-21 12:10:52,260 X264: coded y,uvDC,uvAC intra: 0.0% 50.0% 0.0%
2015-02-21 12:10:52,260 X264: i16 v,h,dc,p:  0% 50% 50%  0%
2015-02-21 12:10:52,261 X264: i8c dc,h,v,p: 100%  0%  0%  0%
2015-02-21 12:10:52,261 X264: kb/s:133.20

Hopefully, it will spit out the reason of the failure so we can investigate futher.


send_delayed for wid 3, delaying again because too many pixels are waiting to be encoded: 1333332

{lots more repetitions of the delay}
[[BR]]
That's because the picture encoder switched to png, which is slow...
And / or because the png encoder is stuck, just like x264.


[[BR]]

delayed_region_timeout: region is 15038ms old, bad connection?

After the error, the entire browser application became completely non-responsive and, as I said, it was impossible to get any response for server when looking for info.
[[BR]]
Looks to me like it's not just x264 that fails, I strongly suspect that the server is out of resources at that point.

Were there many speaker restarts before that? Can you reproduce the bug with sound turned off? (could be that the sound code is leaking) With the clipboard turned off? etc..

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 05:20:05: antoine changed priority from major to critical

@totaam
Copy link
Collaborator Author

totaam commented Feb 21, 2015

2015-02-21 05:20:05: antoine commented


(raising priority - crashes are bad!)

@totaam
Copy link
Collaborator Author

totaam commented Mar 4, 2015

2015-03-04 21:24:53: afarr changed owner from afarr to antoine

@totaam
Copy link
Collaborator Author

totaam commented Mar 4, 2015

2015-03-04 21:24:53: afarr commented


Ok, took a while to reproduce while closely watching server resources... but yeah, it looks like the memory was totally maxxed on the server before it happened. about 65K of 2 gigs RAM left available, and the swap allocation had been run up to 100% for 20+ minutes.

I had seen plenty of speaker restarts over the life of the session, but none in the 10 minutes prior to the crash. Wasn't using the clipboard much either. It looks like it is mostly just an issue with the memory handling with a generic webkit browser.

Not sure if it is worth pursuing this any further (perhaps to allow some sort of warning that resources are used up?)... suspect that we should just close this ticket and assume that some traceback has to precede the crashing of a server for lack of resources, and this just happens to be the one I'm hitting lately.

@totaam
Copy link
Collaborator Author

totaam commented Mar 5, 2015

2015-03-05 05:09:47: totaam changed owner from antoine to afarr

@totaam
Copy link
Collaborator Author

totaam commented Mar 5, 2015

2015-03-05 05:09:47: totaam commented


but yeah, it looks like the memory was totally maxxed on the server before it happened
[[BR]]
Was it xpra using the memory? Or the webkit browser? Just looking at top will give you a good indication.
You can also see how xpra is using the memory in more detail using:

xpra info | grep memory

Capturing this at regular intervals might help.

To make it easier to reproduce the problem you can lower the total memory so that you hit the limit sooner.
You could also try killing the browser and see if the xpra server recovers (assuming it is not using the exit-with-children option)

If it is xpra that is leaking the memory, please try to see if turning the sound off resolves the problem, if so then hopefully we can eliminate this issue as part of #669 / #400.

@totaam
Copy link
Collaborator Author

totaam commented Mar 6, 2015

2015-03-06 23:13:19: afarr commented


It doesn't look like it's the xpra server using the memory.

Repro'd twice more (I'll attach screenshots of the top output, in case there's something of use to be gleaned). In one case the xpra server was using 17.8% of the 2 gigs RAM and 200K swap - while the browser was using, among all its tabs, 69.5%.

In the second case the server was using 6.3% while the browser was using 75%. In the second case I was able to get xpra info on memory as server was becoming non-responsive:

threads.memory.children.idrss=0
threads.memory.children.inblock=12368
threads.memory.children.isrss=0
threads.memory.children.ixrss=0
threads.memory.children.majflt=83
threads.memory.children.maxrss=210304
threads.memory.children.minflt=12168
threads.memory.children.msgrcv=0
threads.memory.children.msgsnd=0
threads.memory.children.nivcsw=435
threads.memory.children.nsignals=0
threads.memory.children.nswap=0
threads.memory.children.nvcsw=231
threads.memory.children.oublock=16
threads.memory.children.stime=0
threads.memory.children.utime=0
threads.memory.server.idrss=0
threads.memory.server.inblock=827904
threads.memory.server.isrss=0
threads.memory.server.ixrss=0
threads.memory.server.majflt=4948
threads.memory.server.maxrss=331236
threads.memory.server.minflt=2366513
threads.memory.server.msgrcv=0
threads.memory.server.msgsnd=0
threads.memory.server.nivcsw=2958911
threads.memory.server.nsignals=0
threads.memory.server.nswap=0
threads.memory.server.nvcsw=6503584
threads.memory.server.oublock=96
threads.memory.server.stime=254
threads.memory.server.utime=965

In trying to run the second repro, however, when I tried to run a xpra control :DISPLAY sound-output stop I got an error saying I had too few arguments.

xpra control :11 sound-output stop --password-file=key --encryption=AES
receiving data using AES encryption
sending data using AES encryption
server returned error code 127
error processing command: sound_control() takes at least 2 arguments (1 given)

Testing with a 0.15.0 windows client against a 0.15.0 fedora 20 server I get this same error when trying to stop sound.

I would note though that in the first test my pulseaudio crashed about 8 minutes prior to the server going down due to memory issues:

Mar 06 11:44:30: 2015-03-06 11:44:30,630 failed to execute ['/usr/bin/pactl', 'stat']: [Errno 12] Cannot allocate memory
Mar 06 11:44:30: 2015-03-06 11:44:30,633 could not detect any pulseaudio monitor devices - sound forwarding is disabled

Followed 8 minutes later, when the client became non-responsive, with this:

Mar 06 11:52:37: 2015-03-06 11:52:36,944 XShmWrapper.setup() shmget(PRIVATE, 7654784 bytes, 1023) failed, bytes_per_line=6056, width=1514, height=1263
Mar 06 11:52:37: 2015-03-06 11:52:36,997 disabling XShm support following irrecoverable error
Mar 06 11:52:37: 2015-03-06 11:52:37,488 setup_pipeline failed for (52, (1, 1), (1, 1), 1514, 1262, codec_spec(swscale), 'YUV444P', (1, 1), 1514, 1262, codec_spec(x264))
Mar 06 11:52:37: Traceback (most recent call last):
Mar 06 11:52:37: File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1100, in setup_pipeline
Mar 06 11:52:37: self._video_encoder.init_context(enc_width, enc_height, enc_in_format, dst_formats, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options)
Mar 06 11:52:37: File "xpra/codecs/enc_x264/encoder.pyx", line 304, in xpra.codecs.enc_x264.encoder.Encoder.init_context (xpra/codecs/enc_x264/encoder.c:2717)
Mar 06 11:52:37: File "xpra/codecs/enc_x264/encoder.pyx", line 327, in xpra.codecs.enc_x264.encoder.Encoder.init_encoder (xpra/codecs/enc_x264/encoder.c:2947)
Mar 06 11:52:37: AssertionError: context initialization failed for format YUV444P
Mar 06 11:53:09: 2015-03-06 11:53:09,831 delayed_region_timeout: region is 15077ms old, bad connection?
Mar 06 11:57:30: (WebKitWebProcess:1646): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:3610): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:2216): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:3618): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:2655): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (browser:742): Gdk-WARNING **: browser: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:3422): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: (WebKitWebProcess:3139): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.
Mar 06 11:57:30: xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":11"
Mar 06 11:57:30: (WebKitWebProcess:778): Gdk-WARNING **: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :11.

Just for fun, I tried to also close the browser to see what would happen, but alas, by the time I spotted the traceback... the client-side browser was completely non-responsive.

@totaam
Copy link
Collaborator Author

totaam commented Mar 6, 2015

2015-03-06 23:14:45: afarr uploaded file ticket806_top-shot-no-info.PNG (28.9 KiB)

first repro top info - no xpra memory info to go with
ticket806_top-shot-no-info.PNG

@totaam
Copy link
Collaborator Author

totaam commented Mar 6, 2015

2015-03-06 23:15:49: afarr uploaded file ticket806_top-shot-yes-info.PNG (145.4 KiB)

second repro top screenshot - to go with xpra memory info
ticket806_top-shot-yes-info.PNG

@totaam
Copy link
Collaborator Author

totaam commented Mar 8, 2015

2015-03-08 06:32:15: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Mar 8, 2015

2015-03-08 06:32:15: antoine set resolution to invalid

@totaam
Copy link
Collaborator Author

totaam commented Mar 8, 2015

2015-03-08 06:32:15: antoine commented


It doesn't look like it's the xpra server using the memory.
[[BR]]
I concur.

[[BR]]

In trying to run the second repro, however, when I tried to run a xpra control :DISPLAY sound-output stop I got an error saying I had too few arguments.
[[BR]]
This belongs in a separate ticket.
The details of the real error are found in the server log:

2015-03-08 13:16:55,285 error processing command sound-output
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_core.py", line 628, in handle_command_request
    error, response = self.do_handle_command_request(command, args[1:])
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 1106, in do_handle_command_request
    msg.append("%s : %s" % (csource, csource.sound_control(*args[1:])))
TypeError: sound_control() takes at least 2 arguments (1 given)

This is fixed in r8748 (backport in 8749).

[[BR]]

I would note though that in the first test my pulseaudio crashed about 8 minutes prior to the server going down due to memory issues:
[[BR]]
FYI: That's not the pulseaudio process itself (though it may well crash too when put under memory pressure), that's just our process trying to enumerate the pulseaudio devices available using pactl stat (which you can run from the command line at any time).

[[BR]]

Followed 8 minutes later, when the client became non-responsive, with this:
(..)
[[BR]]
It looks to me like we're trying to continue as best we can, but when memory runs out... things will go down in flames sooner or later.

So I am closing this ticket as 'invalid' because I don't think we are leaking any memory. If you do find that the xpra server process is leaking memory (which top shows as airgap-server I believe) then we can re-open this ticket. (note: on a busy session with lots of large windows open, we could conceivably use up to 500MB of resident memory)

@totaam totaam closed this as completed Mar 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant