-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WASM examples broken if the user switches tabs #144
Comments
Let me know if you figure out the solution; if this is something that's better solved at the xwt level I'm interested in adding support. |
@cBournhonesque quick update: browser seems to pause io tasks when in ram saving mode regardless of whether or not the client runs in a webworker, so the appropriate solution should be adjusting timeout i think |
(only tested on brave, but should be the same for other major browsers when ram saving is active) |
when ram saving mode is disabled, this issue does not seem to occur, so im certain that this is indeed the root cause |
There are tricky sneaky ways to keep the tab alive if this is the reason btw - but none I'd recommend implementing at this crate level |
@Nul-led so you have confirmed that, if you disable ram-saving mode, you can freely switch tabs and the game (including io tasks) will continue working in the background? i.e. the issue totally disappears if you disable ram-saving mode? |
@cBournhonesque apparently the thread does not actually get stopped entirely but instead just throttled. Might be possible to figure out if that happens and temporarily stop sending and receiving packets. Disabling ram saver seems to work on brave, cant say with other browsers. Requires more testing ig. |
It's very unclear what the real issue is from reading this. Can someone confirm:
Regarding the "Keep Alive" strategy:
Other strategies:
|
If you have audio playing on your tab it won't get suspended. It would actually be quite fine for a game to apply this workaround. Automatic reconnection at the WebTransport layer and all that is not possible since the browser doesn't really give us control over those details of the connection that would enable it: specifically, timeouts. It is the browser that would terminate the WebTransport session, and this will happen regardless of whether tab is paused or not, so we can't hook into it. It is totally possible to implement the reconnect at the app level though. Would require a certain layer of logic on top of the transport, like a custom handshake to identify the connecting party - but that is possible. The lack of control over the RTT0 in the browsers API is a bit unfortunate here - but if it was there it would be not as bad latency-wise. |
I do think this should be brought up to W3C or a WT working group, but regardless... Since we're blocked on W3C and browsers, it sounds like there are only 2 reasonable solutions that will be solved within the heat-death of the universe.
(or Both, long term) I think 2) is understood enough by yourselves to be fixed today. Is there any chance we could add that logic to the simple-box example? Specifically, to spell it out to new users (myself), or have an entirely new demo just for reconnecting. Ideally, 3 app states: Connected, Reconnecting, Disconnected. If reconnecting happens, have some text centered on screen that says "reconnecting...". |
The real solution is adding audio to the game... :D |
@simbleau I don't really understand the problem clearly myself.
As for the reconnecting logic: |
It is though. There are npm packages that play an audio stream of barely audible noise precisely to do just this.
No, it won't. Users have to comply with the workaround if they want to remain connected, and if not - well, it is always up to them. Browsers don't have a good way to keep a tab alive. There's https://www.w3.org/TR/screen-wake-lock/ but it is fora different purpose. The easiest way for the user to keep the tab active is if it plays audio. The less easy way is for them to add the origin to the list of websites that never go inactive, and the most difficult way is to disable who whole Chromium / Firefox feature - which is nonetheless doable. That said, there's also background sync, so, maybe you don't actually need the WebTransport session... This does not seem like a portable solution fit for this kind of crate though. Maybe for a more comprehensive networking solution specialized on web apps/games. |
A small, important clarification:
Re: web sockets- Yes, that's right. bevy_rtc doesn't have this issue because it uses WebRTC with signaling built over web sockets. Those web sockets never go idle because, regardless of whether the client app is frozen, the server continues to send KeepAlive packets to the web socket. Re: reconnecting - it's unclear to me, too. I lean on you two to figure this out. I'm guessing when you connect there's a refresh token the client can be told about for "fast reconnecting," However I'd be fine with a total teardown/re-connect. As long as there's some way to reconnect... |
What about web workers? |
I call that a hack, not a solution. Perhaps we need to file a case under W3C, actually, to address this. Because even for games, that's a shitty "solution". I mute tabs often, especially games. Communicating the technical problem and putting the onus on users to circumnavigate it is technically embarrassing and difficult for, eg. Children and childrens games. |
Filed w3c/webtransport#600 |
It is absolutely a hack. As you said, W3C has to deal with it, the would probably be a new Wake Lock Web API for this. This is a lot of work however, and definitely not something that is available today - so the workarounds and hacks are still meaningful to discuss here. UPD:
This is great, let's see what they say! I have doubts they'll give us something, as this is a Chrominum thing and is standartized afaik. I've been going though the source to figure out where it's implemented, so far found this - might be a good place to explore for others too. |
I was talking about the 0RTT QUIC handshakes - they allow establishing a new QUIC connection reusing some of the key material data from the previously-established-but-now-closed QUIC connection to save a few exchanges in the handshake. This is not resuming the old connection though - it is creating a new connection, so re-connection. With re-connection, it all depends of how the application handles the new connection. If it has a persistent identifier for the client and correlates the context with the said identifier rather than the connection - so that the connections are context-less besides providing the reference to the said persistent identifier - it is very trivial to implement reconnections, assuming the apps supports "connecting mid-game" or otherwise allowing newly connected clients in whatever is going on. This is usually done in games through the initial world-state replication on connection - but in this case an additional support for replicating updates for the previously connected persistent identity (just over a new connection) would be required. So, the application-level support for seamless reconnection would likely be a "real" solution, as it would not rely on transient state like WebTransport session to be intact in the first place. I would say though this is a job either for a specific application/game to implement, or a really high-level networking framework, that takes opinionated control over away more things that That said, the solution would most likely have to transport-agnostic, as this is in now way a WebTransport-specific issue - as a typical transport state is transient. QUIC (the HTTP3/WebTransport underlying protocol) has keep-alive for idle connections as well. See https://datatracker.ietf.org/doc/html/rfc9308#name-session-resumption-versus-k Web API for WebTransport may just expose the configuration parameters for idle connections management - but overall this is still worse than the solution above, albeit less of a hack than playing audio. Note that this, however, would not solve the issue - well, at least maybe not entirely. |
There are a number of scenarios here that could happen. I have not investigated it in practice, but it is true that the browser causes the connection to disconnect - potentially by not enabling the keep alive settings. But this is unclear, and might be that the server actually sends the goaway frame. |
Thinking about this - if you can extract the whole networked game state maintenance loop into the Web Worker together with the WebTransport - sure, that would work (well, except WebWorkers are deactivated too at certain times, so maybe a ServiceWorker instead, but this can be determined later down the line). That way you can ensure the data the server communicates is not lost and processed to the best of the client's ability while the rendering is unavailable. But moving only WebTransport out would cause the same issue I described at #144 (comment) (second part). |
At the w3c/webtransport#600 they are saying it's an implementation bug, which is what I was very much suspecting thus my attempts to find the tab deactivation code in the Chromium source. From what I recall from reading WebTransport though - it shouldn't be an issue with the tab deactivation. What is most likely the issue though is that the client and server can't agree on the idle timeouts - which may or may not be caused by Chrome side, but based on the lack of the settings to tweak the idle timeout in the spec - it could. Unfortunately, there is still a problem of data loss that has to be solved (world state reinit or state diff sync), because the datagrams will be dropped from the recv queue if the app can't keep up with them, and the frozen app definitely can't. |
Ok so, we need confirmation from a Chromium filed issue this is a bug. Otherwise we aren't sure if it's a lightyear/xwt bug. There will be people, myself included, who wouldn't experiment or adopt lightyear today if this is a design choice of WebTransport that won't be fixed. Secondly, I'll propose we document the workaround: Disable RAM saving mode with an issue to track the Chromium bug. Lastly, anyone want to add a reconnection example? I think it would be helpful in any case. |
Would actually be really easy to confirm. If this behavior happens with WebSocket Transport too, then we have our culprit i think :P |
I haven't actually tried anything more than the examples for lightyear. I'm waiting on #253 to really dive into using WT. Hopefully someone can confirm who has experience with Lightyear. |
I now have examples for xwt itself - so another way would be to run those and check if they also demonstrate the same behaviour. |
@MOZGIII it works |
@cBournhonesque so RAF is indeed the culprit... |
What is RAF? |
@simbleau |
Do we have a hypothetical solution or just have identified the problem? |
I tried using I don't really get the RAF part, but it might because bevy still stops running when we switch tabs, which means that we stop sending/receiving keepalive packets because the netcode logic runs inside bevy. Potential solutions:
|
... So this is a software timeout? I feel like we've asked that before and the answer was less clear than it is now. That's exactly why we filed the issue under W3C/WebTransport, since it was believed the behavior was from the browser's WebTransport runtime. This feels really silly now. Could we just disable the timeout in the bevy system? At the very least it seems reasonable for it to be configurable. |
It is already configurable: lightyear/lightyear/src/client/config.rs Line 25 in 30fe00a
and lightyear/lightyear/src/server/config.rs Line 18 in 30fe00a
It's just that having a very high value (20+ seconds) doesn't seem ideal. If a client disconnects suddenly (closes the tab), you would have to wait 20 seconds before the server is aware of the disconnection. I also created an issue on bevy to potentially make the scheduler keep running bevy systems even if the tab is in the background: bevyengine/bevy#13368 |
Another possibility is bevy might be doing something to actively put itself (its wasm instance) on hold on tab switches. |
Can anyone make a simple / minimal guide on how to reproduce this issue? |
I'm trying to make a simple example (without networking): bevyengine/bevy#13370 |
@MOZGIII rAF just holds indefinitely while the tab in inactive. Thats known behavior. So thats determined to be the issue. |
Well, yes, for RAF that's expected. But why does it still break when the code is run using Was there a miscommunication or confusion here of some sort? |
Ah, I read the issue. I am not sure you'd want that - to run bevy systems in the background... Might be better to extract the systems that need to run while in background into their own threads (or Promises, but not bevy tasks). That's what my architectural approach to this would be, at least. Anyhow, if you need to run bevy systems specifically it could be solved by using/compositing multiple schedulers - in a way that you run some systems on RAF and some with fixed intervals. That would also make bevy tasks function. This could be something that's offered by bevy out of the box - but I'd recommend first experimenting with this locally, as whatever |
@MOZGIII i generally agree with this sentiment. |
Sorry I'm a bit slow... is this a good summary? Potential solutions: B) set the netcode timeout to a very long time as a quick way to get unblocked. The io tasks shouldn't timeout anymore since they still run in the background when spawned via The issue is that the bevy systems will still be throttled on the client so:
C)
Same issues as in B). D) Handle disconnection/reconnection in your game.
It's already possible to disconnect/reconnect; so I guess this would be the best solution? E) have some other way to force bevy systems to still run in an unthrottled manner. |
I am thinking currently that having a separate, non-bevy world and ECS for game logic that it network-replicated is a good idea. It is definitely an option to add to the list above, because that thing can in theory run in a WebWorker and handle not only the packet buffering, but full processing of them. The issue with this is that WebWorker to window communications can be permissively slow in terms of latency - in the 10s of milliseconds just to send a message. This is not great for any game - might be ok for some, but even there users could notice easily that the game is not very responsive. For other games that would be a hard blocker, I mean waay worse than freezes on tab switches. So, for this crate, I'd suggest either building a portable core that can be used in any way - depending on the app needs, or supporting either one of in-WebWorker or in-window ways of running the networking, or explicitly both. |
Actually, there is an Bevy API for window focus. It's unclear if We could try something with this API, but maybe here's a better idea: I'd assume the best solution to disconnect if and only if the last message was over ~20 seconds ago, but we've polled the message queue actively within that time. Otherwise, the timer would be much longer (5 minutes). This would mean we'd need two timers on both the client and server. One for active listening (true timeout) and one for inactive listening (user tabbed away). We could make this more robust by potentially sending a message from the client when they tab away. There's nothing stopping us from adding a use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;
use web_sys::{Document, window, Event};
#[wasm_bindgen(start)]
pub fn start() -> Result<(), JsValue> {
// Get the document object
let document = window().unwrap().document().unwrap();
// Define the callback function
let callback = Closure::wrap(Box::new(move |_event: Event| {
if document.visibility_state() == "hidden" {
// The tab has become unfocused
web_sys::console::log_1(&"Tab is unfocused".into());
} else {
// The tab has become focused
web_sys::console::log_1(&"Tab is focused".into());
}
}) as Box<dyn FnMut(_)>);
// Add the event listener for the 'visibilitychange' event
document.add_event_listener_with_callback("visibilitychange", callback.as_ref().unchecked_ref())?;
// Keep the callback alive
callback.forget();
Ok(())
} |
Sorry, going to summarize my incoherent thoughts above in a much more digestible manner here: Steps:
Web workers won't do, simply put. It's a tremendous overhead. It has merit in some applications, but should be a very last resort, and is likely a complex task. Likewise, I don't see any bevy changes happening anytime soon, especially one with the complexity of running systems in web workers. |
Small correction on the nature of WebTransport:
Also, I don't think you can change the client (browser) timeout, can you? I don't think it makes sense to add another timer - just allow reconnecting asap, with the new session evicting the old session. A simple algorithm sketch to do that would be for the client could generate a random number every time it starts, and present that number to the server upon connecting, and reconnecting. The server, if sees that the same number is already used by another presently connected session upon connection would just assume that session is dead and would drop soon due to timeout, and force-kill it, then transfer server-side resources of that old session to the newly connecting session. thus effectively using the new connection as a signal for the old session force-disconnect, instead of relying on timeouts. The keep-alive internal and idle timeout on the server side then can just be set to a reasonable, high values for the usual operation without the need to switch the mat runtime. Not sure if those are even tweakable on the client side, pretty sure there's no way: https://w3c.github.io/webtransport/#dictdef-webtransportoptions |
Regarding this: bevy is actually very modular, and so it would be relatively easy to implement a custom even loop for web if the need be. Easiest way is to even not build it from scratch, and just vendor and patch the winit one a bit to test things up, if that proves to be the issue. From the source, it looks like |
I think this PR broke the examples for some reason.
UPDATE:
On client we get:
POSSIBLE SOLUTIONS:
The text was updated successfully, but these errors were encountered: