Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suspending a local client with opengl windows can show corrupted pixels #492

Closed
totaam opened this issue Jan 12, 2014 · 27 comments
Closed

Comments

@totaam
Copy link
Collaborator

totaam commented Jan 12, 2014

Issue migrated from trac ticket # 492

component: client | priority: critical | resolution: fixed | keywords: opengl

2014-01-12 02:09:41: totaam created the issue


This is only relevant to local servers: resuming a client connected to a remote server should break the connection (eventually - we may want to break it quicker then) which is fine.

On Linux, we should be able to get the event from the UPower Resuming dbus signal. Found some example code:

On win32, we could detect WM_POWERBROADCAST events.

Then we can just ask for a server lossless refresh to make sure the windows display clean contents.

This looks like a driver bug to me: the GPU buffers should be preserved, maybe it is the OpenGL paint state that is inconsistent?

@totaam
Copy link
Collaborator Author

totaam commented Jan 20, 2014

2014-01-20 11:16:11: totaam changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Jan 20, 2014

2014-01-20 11:16:11: totaam changed owner from antoine to totaam

@totaam
Copy link
Collaborator Author

totaam commented Jan 20, 2014

2014-01-20 11:16:11: totaam commented


Easy to reproduce, and should be easy to fix too.

@totaam
Copy link
Collaborator Author

totaam commented Feb 4, 2014

2014-02-04 15:02:21: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Feb 4, 2014

2014-02-04 16:29:39: antoine commented


The dbus approach sounds nice, except it doesn't work... I can't get any of the code examples to fire. This is also meant to fire the same Resuming signal (found in /usr/lib/systemd/system/upower.service), but does nothing:

dbus-send --system --type=signal --dest=org.freedesktop.UPower \
    /org/freedesktop/UPower org.freedesktop.UPower.Resuming

Posted a question here: system suspend - dbus upower signals are not seen

I have now also created a Fedora ticket for this: bugzilla 1064906

@totaam
Copy link
Collaborator Author

totaam commented Feb 5, 2014

2014-02-05 06:49:46: totaam uploaded file dbus-suspendresume-notifications.patch (2.2 KiB)

dbus hooks for trying to get suspend/resume notifications

@totaam
Copy link
Collaborator Author

totaam commented Mar 16, 2014

2014-03-16 15:21:00: totaam commented


According to this answer: Newer upower versions no longer emit that signal since this handled by systemd. Now I have to ask systemd what we're supposed to do... this is not going to make the code any nicer!

@totaam
Copy link
Collaborator Author

totaam commented Mar 16, 2014

2014-03-16 15:42:10: totaam commented


The systemd / logind equivallent is PrepareForSleep:
The PrepareForShutdown() resp. PrepareForSleep() signals are sent right before (with the argument True) and after (with the argument False) the system goes down for reboot/poweroff, resp. suspend/hibernate.

So it looks like we need to look for logind and listen for this new signal, and fallback to upower otherwise.

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 04:47:08: totaam uploaded file watch_PrepareForSleep.py (0.7 KiB)

dbus script using the new login1 interface

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 08:01:45: totaam changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 08:01:45: totaam changed owner from totaam to afarr

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 08:01:45: totaam commented


r5821 simply logs the suspend and resume events, like so:

2014-03-17 18:55:55,400 system is suspending
2014-03-17 20:09:40,209 system resumed, was suspended for 1:13:44

afarr: please test that the message does show up on the platforms that are meant to be already supported and which I am unable to test as virtualbox does not support OS level suspend and resume:

  • win32
  • linux without systemd (ie: debian or ubuntu)
  • linux with systemd is tested already (Fedora 20)
    Eventually, we may also fire other actions from those callbacks to notify the server or re-connect if necessary.

r5824 contains some critical fixes, and r5826 fires the window refresh.
It works here. Bug fixed.

As for OSX... it's never simple, and again I won't be able to test with virtualbox, here are some pointers:

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 15:48:06: totaam commented


OSX is done in r5828, it's not pretty but it works!

So please test this too - on virtualbox, suspending via the apple menu, shows "suspending", followed by "resuming" just 2 seconds later. So it seems to be working.

@totaam
Copy link
Collaborator Author

totaam commented Mar 17, 2014

2014-03-17 19:45:45: afarr commented


I don't currently have access to a debian or ubuntu system for testing.

  • win32 - I think I'm missing something about how I can test this.

You explicitly mention the following:

This is only relevant to local servers: resuming a client connected to a remote server should break the connection (eventually - we may want to break it quicker then) which is fine.

... which I will confirm is the case. With a win32 (0.12.0 r5828) client attached to a fedora 19 server, when I suspend (sleep) the windows 7 machine the connection is nearly instantly severed. The server session carries on happily, but the client disconnects.

However, when I try to run a "local server" - I am informed that "(This xpra installation does not support starting local servers.)"

C:\Program Files (x86)\Xpra>xpra_cmd.exe --no-daemon --bind-tcp=0.0.0.0:1201 --s
tart-child=xterm --start-child=xterm start :17
Usage:
        xpra_cmd.exe attach [DISPLAY]
        xpra_cmd.exe detach [DISPLAY]
        xpra_cmd.exe screenshot filename [DISPLAY]
        xpra_cmd.exe info [DISPLAY]
        xpra_cmd.exe control DISPLAY command [arg1] [arg2]..
        xpra_cmd.exe version [DISPLAY]
        xpra_cmd.exe shadow [DISPLAY]
(This xpra installation does not support starting local servers.)
  • Is there a windows package/option that does support starting a local server?

@totaam
Copy link
Collaborator Author

totaam commented Mar 18, 2014

2014-03-18 01:11:21: totaam commented


Sorry I should have made this clearer: although the visual corruption is only relevant to local servers, as only local servers will still be connected when resumed (usually - but this also works with virtual machines on the same host), the suspend & resume state detection code is what I am interested in.
The lines:

system is suspending
system resumed, was suspended for XX:XX:XX

And whether the state detection is timely and accurate.
[[BR]]

I don't currently have access to a debian or ubuntu system for testing
[[BR]]
I believe smo does, you can re-assign to him once you have tested the platforms you do have.

FYI: it may be used in the future (ie: #493), and will probably be used in this release to warn the server that a disconnection event is likely, and stop wasting bandwidth sending data that will never arrive at its destination - as per #543. I can only add this code once I am confident that the suspend and resume events are received reliably.

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2014

2014-03-20 14:22:44: totaam changed priority from minor to critical

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2014

2014-03-20 14:22:44: totaam commented


Raising as this is blocking #543

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2014

2014-03-20 19:31:12: afarr commented


Trying to test with windows 7, I'm not seeing any suspend messages.

  • Setting the sleep time to 1 minute and waiting, when the machine goes to sleep the connection is maintained, but there are no messages (neither on the client-side nor server-side). The closest thing I see, client-side, is: unexpected message: 50006 / 0 / 0.

  • Trying to use the start menu to force sleep and then restart it promptly, I get nothing server-side, and client-side I see something along the following lines:

2014-03-20 12:13:11,664 re-starting speaker because of overrun
2014-03-20 12:13:12,351 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2014-03-20 12:13:38,335 unexpected message: WM_POWERBROADCAST / 4 / 0
2014-03-20 12:13:40,029 re-starting speaker because of overrun
2014-03-20 12:13:40,717 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2014-03-20 12:13:49,749 unexpected message: WM_POWERBROADCAST / 18 / 0
2014-03-20 12:13:49,815 unexpected message: WM_POWERBROADCAST / 7 / 0
2014-03-20 12:13:49,826 unexpected message: WM_TIMECHANGE / 0 / 0
2014-03-20 12:13:49,977 server is not responding, drawing spinners over the windows
2014-03-20 12:13:57,947 server is OK again
2014-03-20 12:13:57,960 re-starting speaker because of overrun
2014-03-20 12:13:59,098 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
  • If I use start menu to induce sleep, and leave it asleep for a few minutes, I lose the connection, again with no system suspension messages client or server side. Client message is as follows:
2014-03-20 12:18:29,937 unexpected message: WM_POWERBROADCAST / 4 / 0
2014-03-20 12:19:50,506 server ping timeout - waited 60 seconds without a response
2014-03-20 12:19:52,766 server is not responding, drawing spinners over the windows
2014-03-20 12:19:52,811 Connection lost
2014-03-20 12:19:52,953 server is not responding, drawing spinners over the windows

Is there a different suspend mode that you have in mind for the windows client while connected, other than sleep?

  • Trying the Hibernate option, again I just lose the connection:
2014-03-20 12:24:54,490 unexpected message: WM_POWERBROADCAST / 4 / 0
2014-03-20 12:24:58,250 unexpected message: WM_NCCALCSIZE / 1 / 1635532
2014-03-20 12:24:58,286 unexpected message: WM_WINDOWPOSCHANGED / 0 / 1635572
2014-03-20 12:24:58,349 unexpected message: WM_NCCALCSIZE / 1 / 1634520
2014-03-20 12:24:58,414 unexpected message: 798 / 0 / 0
2014-03-20 12:27:31,762 unexpected message: WM_TIMECHANGE / 0 / 0
2014-03-20 12:27:32,118 unexpected message: WM_POWERBROADCAST / 7 / 0
2014-03-20 12:27:33,349 server is not responding, drawing spinners over the windows
2014-03-20 12:27:33,960 unexpected message: WM_POWERBROADCAST / 18 / 0
2014-03-20 12:27:39,076 unexpected message: WM_NCCALCSIZE / 1 / 1635532
2014-03-20 12:27:39,085 unexpected message: WM_WINDOWPOSCHANGED / 0 / 1635572
2014-03-20 12:27:39,098 unexpected message: WM_NCCALCSIZE / 1 / 1634520
2014-03-20 12:27:39,108 unexpected message: 798 / 0 / 0
2014-03-20 12:27:42,506 unexpected message: WM_WININICHANGE / 47 / 582344
2014-03-20 12:27:44,555 unexpected message: WM_WININICHANGE / 47 / 582344

Server side, I just get the "Disconnecting ... reason is: client ping timeout, - waited 60 seconds without a response" message.

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2014

2014-03-20 20:46:39: afarr commented


I think I found the problem - just noticed the previous testing was with r5444, repeating with r5828...

  • Setting the sleep time, there is a new unexpected message (still noting server-side):
2014-03-20 13:36:30,243 unexpected message: 49841 / 0 / 0
2014-03-20 13:38:04,336 unexpected message: 49841 / 0 / 0
  • Forcing a sleep, followed by prompt awakening - I get the suspend message client-side:
2014-03-20 13:40:49,585 system is suspending
2014-03-20 13:40:54,286 server is not responding, drawing spinners over the windows
2014-03-20 13:40:57,993 system resumed, was suspended for 0:00:08
2014-03-20 13:40:59,197 server is OK again
2014-03-20 13:40:59,891 re-starting speaker because of overrun
2014-03-20 13:41:03,323 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2014-03-20 13:41:05,269 re-starting speaker because of overrun
2014-03-20 13:41:06,025 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
  • Forcing a longer sleep - I still get the suspension messages, along with disconnection:
2014-03-20 13:42:21,933 system is suspending
2014-03-20 13:42:24,628 server is not responding, drawing spinners over the windows
2014-03-20 13:43:40,279 system resumed, was suspended for 0:01:18
2014-03-20 13:43:40,358 WM_TIMECHANGE: time change event: 0 / 0
2014-03-20 13:43:40,390 server ping timeout - waited 60 seconds without a response
2014-03-20 13:43:41,920 Connection lost

@totaam
Copy link
Collaborator Author

totaam commented Mar 20, 2014

2014-03-20 21:04:52: afarr commented


Testing with osx r5458 ...

  • With a timer induced sleep, I get no messages at all.

  • With a short sleep I get the suspension messages:

2014-03-20 13:54:15,269 system is suspending
2014-03-20 13:54:18,129 re-starting speaker because of overrun
2014-03-20 13:54:26,124 server is not responding, drawing spinners over the windows
2014-03-20 13:54:42,131 system resumed, was suspended for 0:00:26
2014-03-20 13:54:47,912 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2014-03-20 13:54:47,918 server is OK again
2014-03-20 13:54:47,920 re-starting speaker because of overrun
2014-03-20 13:54:50,492 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
  • With a longer sleep I don't get the server resume message, but I get the suspension message:
2014-03-20 13:57:35,566 system is suspending
2014-03-20 13:58:54,053 server is not responding, drawing spinners over the windows
2014-03-20 13:59:00,285 read connection reset for SocketConnection(('10.0.11.191', 51408) - ('10.0.32.172', 1201))
2014-03-20 13:59:00,287 connection lost: read connection reset: [Errno 54] Connection reset by peer
2014-03-20 13:59:00,289 Connection lost

@totaam
Copy link
Collaborator Author

totaam commented Mar 21, 2014

2014-03-21 04:36:45: totaam commented


I've added a test application ("Events_Test.exe") for win32 in r5873, which should make it easier to investigate power events.

[[BR]]

I've hooked power events into the window refresh code in r5875 - see #543.
More follow up work in #540.

[[BR]]

afarr: The OpenGL issue remains fixed, so please just check that a quick suspend-resume cycle works as well as it did before and then close this ticket.

@totaam
Copy link
Collaborator Author

totaam commented Mar 24, 2014

2014-03-24 23:03:45: maxmylyn commented


Tested with r5903:

  • OSX and Windows behavior works as well as it did.

However, on my laptop I was not able to get a resume even with a short sleep cycle, instead only the following errors printed regardless of sleep length (I only pasted the relevant prints).


2014-03-24 15:55:42,141 system is suspending
2014-03-24 15:55:43,796 read error for SocketConnection(('10.0.11.77', 54092) - ('10.0.32.172', 1200))
Traceback (most recent call last):
  File "xpra\net\protocol.pyc", line 606, in _io_thread_loop
  File "xpra\net\protocol.pyc", line 660, in _read
  File "xpra\net\bytestreams.pyc", line 117, in read
  File "xpra\net\bytestreams.pyc", line 60, in _read
  File "xpra\net\bytestreams.pyc", line 52, in untilConcludes
  File "xpra\net\bytestreams.pyc", line 22, in untilConcludes
error: [Errno 10053] An established connection was aborted by the software in your host machine
2014-03-24 15:55:43,798 connection lost: read error on connection: [Errno 10053] An established connection was aborted by the software in your host machine
2014-03-24 15:55:43,798 Connection lost

This seems to be only related to the individual laptop's sleep cycle, if it seems worth pursuing let me know and I'll test it further; otherwise this is good to be closed.

@totaam
Copy link
Collaborator Author

totaam commented Mar 25, 2014

2014-03-25 08:56:59: totaam changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Mar 25, 2014

2014-03-25 08:56:59: totaam changed resolution from ** to fixed

@totaam
Copy link
Collaborator Author

totaam commented Mar 25, 2014

2014-03-25 08:56:59: totaam commented


Judging by the Microsoft KB entry, the error above is a WSAECONNABORTED: An established connection was aborted by the software in your host computer, possibly due to a data transmission time-out or protocol error. It may be specific to the machine drivers or BIOS. The full list is here: Windows Sockets Error Codes

In any case, this led to a "connection lost", which is fine. (what we don't want is for the connection to stay up after we told the server to slow down, without telling it to speed up again)

r5904 should remove the ugly stacktrace (we add a bunch of win32 specific error codes to the ignore list)

@totaam totaam closed this as completed Mar 25, 2014
@totaam
Copy link
Collaborator Author

totaam commented Sep 5, 2014

2014-09-05 08:02:20: antoine uploaded file xterm-resume.png (459.4 KiB)

this is what my xterm looked like when I resumed
xterm-resume.png

@totaam
Copy link
Collaborator Author

totaam commented Sep 9, 2015

2015-09-09 11:44:15: antoine commented


r10573 should finally fix this properly: refreshing the pixels is not always enough, we may have to also reinitialize the window backing.

See also: #901, #924

Minor regressions: #2482, #2484

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant