-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodic flashing in Safari (only) #18677
Comments
If no one has an idea, I'd recommend checking if there is a difference in an older version. If so then bisecting could give us an answer. |
Sadly bisecting is not practical. As I explained in an earlier thread, the code does not run at all in a large range of Emscripten versions (something like 2.0.13 to 3.1.20 inclusive) because support for multithreading in SDL2 was broken in those versions. It's possible that the problem arises not from Emscripten but from SDL2, which has itself undergone many changes in the same period (I've always used the version you get from As I said, asking here was a long shot. Have any Safari-specific issues ever been identified? |
(I can't think of any rendering-specific ones, sorry. But maybe someone else will remember something.) |
As it turns out that was a good suggestion, because the problem has seemingly been introduced quite recently and a bisection has allowed me to find when: Emscripten 3.1.27 does not exhibit the flashing in Safari whereas 3.1.28 does. Also, fortunately, the bundled versions of SDL2, SDL2_ttf and SDL2_net did not change between these releases, so that presumably rules out that they are contributing to the issue. So something changed between 3.1.27 and 3.1.28 which has resulted in the flashing in Safari (but not in any other browser I have tried, including other Mac browsers like Vivaldi). If you want to try it yourself here are versions of the Newton's Cradle simulation built with 3.1.27 and 3.1.28. I am using Safari on an 'Apple Silicon' Mac Mini which shows the difference clearly, |
Is this regression going to receive some attention? Can I provide any other information to help? |
I think the best way to make progress here is to bisect into that range, to find the specific commit and not just the release. Once we have that commit, the problem might be obvious. |
I didn't realise that was possible. Can I type I hoped that as the fault is so specific (black frames being displayed at a regular rate, which I would estimate as a little faster than 1Hz) it would be easy to guess which change was responsible. What happens inside Emscripten (i.e. the Java or Web Assembly) at that rate anyway? If you have a Mac, have you been able to repro the fault using the two builds I linked to? |
Yes, you can install builds to bisect. It does require large downloads in each step, but the number of steps is logarithmic so it's not that bad in practice. Here is the link again: https://emscripten.org/docs/contributing/developers_guide.html#bisecting I really can't think of anything in Emscripten that could cause such a problem, so I don't really have any guess at this point... But maybe something in rendering code, or event loop timing..? (I do not have a mac, sorry.) |
Thanks, but the instructions there are not very clear. For example "If instead you only know Emscripten version numbers, use emscripten-releases-tags.json to find the hashes" , that means nothing to me at all! What is
Please remember that I am compiling code which runs faultlessly in Windows, MacOS, Linux (both 32-bit and 64-bit), Android and iOS. It has been run on a wide range of platforms, from very slow (e.g. a Raspberry Pi 3) to very fast. On none of those platforms has it ever exhibited anything similar to the 'flashing' I see in Safari with Emscripten 3.1.28 and later. |
Here is that file https://github.com/emscripten-core/emsdk/blob/main/emscripten-releases-tags.json In it, if you look for "3.1.30": "dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1", That means that the build hash of release
Understood, yeah, the web platform can be weird in some ways. The rendering loop ties into the JS event loop for example. Could be something there perhaps. |
OK, I will try a finer-grained bisection, but it may take a little while because it's quite a slow process to install/activate the new commit, do a clean build, upload to a web server and then test, even though it is only log2 of the total! I have 'downgraded' to 3.1.27 for my release version so that my end-users are not inconvenienced by the bug. I know I could look at the release notes to see what has changed in 3.1.31, but is there anything specific I should worry about in doing that? |
I can't think of a specific reason to worry about downgrading, unless you are using the experimental pthreads + dynamic linking support, which has seen some critical bugfixes recently. |
Sorry, I just can't figure it out. I don't have (or understand) git, so I can't use |
The releases repo has the commit hashes for every build, including releases, https://chromium.googlesource.com/emscripten-releases For example, from before we mentioned
Then that commit |
Since you don't have git here are relevant commits: 3.1.27 is 48ce0b44015d0182fc8c27aa9fbc0a4474b55982 (https://github.com/emscripten-core/emsdk/blob/main/emscripten-releases-tags.json#L2)
|
You should be able to do |
Thank you so much for the list, it will make it much easier. |
The 'Periodic flashing in Safari' regression appears to have been introduced between these two commits:
I hope that may allow somebody to pin down the cause. If you have a suitable Mac to test it on (I am using an 'Apple Silicon' Mac Mini) here are direct links to demonstrate the issue:
|
So the emscripten-releases roll that caused the issue was https://chromium-review.googlesource.com/c/emscripten-releases/+/4075232 The goods news is that change corresponds just a single emscripten rev #18267. I guess it makes sense because change changes relates the how time is reported across different threads. |
If I'm understanding the comments there, it does seem that this change was responsible for some other odd, and seemingly unrelated, issues, including something Safari-specific. What's the next step? I can relatively easily test any proposed fix so long as I can use |
It would be good to verify that reverting the change makes the problem goes away for you @rtrussell , that is, that if you are on latest emscripten, and manually revert that change, that things work properly. It's not a big diff so hopefully that can be done. Separately another thing that would be good to do is to hear from @RReverser who landed that change, and might have an idea of what could be going wrong. |
Uh, sorry, I don't :/ I mean, if anything, that implementation only affected the origin for the time shift, not the accuracy of timers themselves, and, if anything, should've made the origin more accurate than before. I don't see how it could introduce flashing or anything like that, but then, knowing Safari's track record of weird bugs, I guess I shouldn't be very surprised if that's the case... Would be good to double-check that it's indeed the culprit though. |
I guess in worst case, if safari didn't implement it correct this could cause the different threads to have different notions of the current time? @rtrussell, does that make any sense to you? Do you think that this kind of time skew could cause this kind of issue? |
@rtrussell can you explain a little more about what your application is doing? Are you doing rendering on a background thread? Are you using something like |
There's a limit to what I can tell you because of the role that SDL2 is playing, the internals of which I know virtually nothing about. But salient points are:
|
Should this regression be fixed in the latest release (3.1.32)? I'd like to take advantage of the update to SDL_ttf (and the forthcoming updates to freetype and harfbuzz) but of course I can't whilst it doesn't run properly in Safari. |
No, we still don't understand what is causing this issue. We could potentially revert #18267, but it would be nice to understand what is causing the issue and come up with real fix and a test to prevent future regressions. |
Since #18267 is just a JS change, it should be possible to download it via https://patch-diff.githubusercontent.com/raw/emscripten-core/emscripten/pull/18267.patch and apply a reverse patch ( |
Let me know if there is anything further I can do, bearing in mind that I would need extremely detailed instructions because I'm not an experienced programmer (I'm a retired hardware engineer in my 70s). The two versions that I ended up with after the bisection are still available if you have a suitable Mac for testing. I don't even know for sure that the issue affects all Macs, there might be something special about my 'Apple Silicon' Mac Mini which makes it particularly susceptible: 25be4826744e15b5558c42d6ce3123eaa1b62aeb |
Bearing in mind that I have no idea what I'm doing, I attempted that (the JavaScript file isn't very easily editable, because there are no line breaks, but I tried it anyway). The result was an 'Exception thrown, see JavaScript console' error, and in the console it says "TypeError: undefined is not an object (evaluating 'performance.timing.navigationStart')". What should I do now? |
Thanks for trying. Interesting, that suggests that won't work either. The error indicates Hopefully someone else reading here has some understanding of the situation on Safari and which API we should be using there. |
Testing on Chrome on Linux, I see @rtrussell To test #include <emscripten.h>
int main() {
EM_ASM({
console.log("importScripts:", typeof importScripts);
console.log("performance:", typeof performance);
console.log("performance.timeOrigin: ", typeof performance.timeOrigin, performance.timeOrigin, performance.timeOrigin, performance.timeOrigin);
setTimeout(() => {
console.log("delayed performance.timeOrigin:", typeof performance.timeOrigin, performance.timeOrigin, performance.timeOrigin, performance.timeOrigin);
}, 1000);
});
} Compile that with
|
This is what I see:
There is no Edit: I'm not running it with |
Thanks for trying. Very strange the delayed one didn't show up. It should appear after a second. But the other info suggests Then my other theory was storing times somehow, and caring about their absolute values. But it's not likely to be some general usage of them like |
It's related to not being run using
If by "my project" you mean something in SDL, possibly, but my code is not to blame. Although I use some time-related functions none of them can affect rendering, and anyway the same code runs perfectly on all the other SDL2 platforms (Windows, MacOS, Linux, Android, iOS) and on all other browsers I have tried. There's no way that black frames can legitimately be rendered, which is what appears to be causing the 'flashing' in Safari. My rendering code is the usual pattern, in skeletal form:
So where do we go from here? There is a demonstrable regression which means I cannot build my project using recent versions of Emscripten, and whilst at the moment I can work around the issue by using an old version, that won't be acceptable once the updates to SDL2_ttf, FreeType and Harfbuzz get included in Emscripten. To double-check that the change which you suspect is causing the problem really is, can you build a version of Emscripten that is identical to the latest release except for that change being reverted, and somehow make it available in a form I can use? |
There isn't a simple way for me to build a release of emscripten and provide it for you, I'm afraid. Though perhaps someone else reading this may know more, as I know the emsdk has some functionality for using custom emscripten etc. from git (but I don't think it can package it up). Overall this is definitely an odd problem. It seems to be specific to your project, but there's no obvious explanation now that we've ruled out the obvious possibilities. (Though I am still confused why you don't see the delayed logging - using One out-there idea might be to try to see if you can reproduce your problem on WebKit outside of Mac. There might be nightly WebKit builds, for example. If you can find one and the problem shows up there, I could debug with that perhaps. Otherwise, our best options at this point might be to wait and hope for someone with a Mac and time to debug this to chime in here. We really need someone to actually understand the problem, as even if we did confirm that a revert helps you, that wouldn't indicate what the fix is - that PR fixed another problem, so we'd need a new fix for both at the same time. |
In case it helps, here are the instructions for using emsdk with a random git version of emscripten: https://github.com/emscripten-core/emsdk#how-do-i-use-my-own-emscripten-github-fork-with-the-sdk |
How would you go about debugging it if you had a Mac? If reverting the change thought to be causing the problem does indeed fix it, what would be your next step? If there is no way the change can be permanently reverted, or achieved a different way, I'm not sure what options remain. Re-reading the related PR the change seemed to cause all sorts of inexplicable side-effects in Safari and Firefox that were put down to 'flaky tests' but suggest to me that the change has a more far-reaching impact than it theoretically should have. |
If you're referring to my comments in the end and specifically #18307, then no, those were flaky tests we had on main branch as well at the time and not something introduced by the PR. As @kripken said, if it's indeed that change that causes flashing in your case, most likely it's something in your code that depends on timing being absolute, otherwise we'd see more reports. |
Sorry to keep repeating myself, but if by "my code" you include SDL2 then I suppose it's possible, but that isn't something that is under my control and I doubt that the SDL developers would be motivated to put much effort into debugging it their end, especially as it's difficult to reproduce. I'm certain it isn't something in the code which I've written myself, because (a) it's generic code which runs perfectly well on every other SDL2 platform and (b) the rendering isn't affected by absolute timing at all ( You seem to be saying that it's not going to be fixed in Emscripten, even though it has clearly been caused by a change in Emscripten. So since there's nothing I can do to fix it myself, that leaves me with no option but to tell my users that my app will not be compatible with Safari going forward. I don't consider that at all satisfactory. |
If only you knew how common it is these days as Safari tends to lag behind both in terms of features and bugfixes, especially since their release cycle is still tied to macOS releases :( For now that seems to be the only of two safe options, another being just continue building with older Emscripten if everything works fine with it, until someone with macOS can debug this issue further. By the way, is your code public anywhere? It might be helpful for us or someone who will debug this issue to be able to look at the source code and build flags used rather than just at the final binary. |
It doesn't. Every other platform on which my app runs (Windows, MacOS, Linux, Android, iOS) supports rendering complex scripts (Arabic, Thai etc.) using the features built into recent releases of SDL2_ttf (which incorporates FreeType and HarfBuzz). Emscripten is lagging behind all those platforms because the version of SDL2_ttf bundled with it is very out-of-date. I understand that some work is underway to rectify this situation, and to bring Emscripten in line with the other platforms in respect of the version of SDL2_ttf. If and when that happens I will want to use that later version, which of course I am prevented from doing whilst this issue affects Safari. One possible workaround would be for me to use the EMCC_LOCAL_PORTS feature to allow me to incorporate a more recent version of SDL2_ttf in what is otherwise the earlier Emscripten build. But it doesn't work, and when I enquired about it I was told that the feature was not intended for production.
Certainly, it's all here. My app is a popular (and famous, in some parts of the world!) BASIC interpreter, and the BASIC program which I have been using to illustrate the issue is cradle.bbc. |
For what it's worth, you technically don't have to use SDL via That is, nothing stops you from downloading latest version of SDL and SDL_ttf in your own Makefile, building them and linking with your app irrespective of the Emscripten version. |
@rtrussell I'm sorry you are having a frustrating time with this issue. I'm sure we can eventually track down and fix this issue but it might take a more time and effort. I tend to agree that simply reverting this change without getting to the bottom of the problem is not the most satisfactory solution here. While I understand where you are coming from with claims like "I'm certain it isn't something in the code which I've written myself", my years of experience with porting C/C++ have taught me that there can always be latent portability issues in any codebase, no matter how portable and no matter how bug free on other platforms. Its not a criticism of you or your codebase to suggest this, we just want to get to the bottom of the issue. |
As for next steps that you could take, I can think of 3:
|
I had read that was true of SDL2 itself, but I wasn't sure it was also true of SDL2_ttf, not least because of its dependence on FreeType and HarfBuzz. When I use -s USE_SDL_ttf=2 I notice two things that are probably significant: firstly the port it downloads is not SDL2_ttf but SDL2_ttf-mt which I assume is a special multi-threaded version. If I compile SDL2_ttf from source, will I get this multi-threaded version? Secondly, it pulls in the FreeType and HarfBuzz ports too (and quite likely multi-threaded versions of those). Will that also happen automatically if I simply include the source tree of SDL2_ttf in my build? |
You can find code for everything it's doing in here: https://github.com/emscripten-core/emscripten/blob/667505546e57040aaf00a56c07c0b924158307e9/tools/ports/sdl2_ttf.py Basically it pulls in regular SDL2_ttf from the official repo and builds it with So if you only want to update SDL2_ttf, you can do that by manually building the latest archive with the same flags. If you want to update Freetype & Harfbuzz as well instead of using ones shipped with Emscripten, seems like that would be more complicated as for those the builder seems to have a lot more custom logic (see |
You are probably right that there is some change I could make to my code which would workaround the problem. My frustration stems from the near impossibility that I would ever be able to find it! I have mentioned before that I am suffering from 'cognitive decline' which has been tentatively diagnosed as Alzheimer's Disease. Much of the code in my project was written ten or more years ago, and the extent to which I still understand it is very limited. Even if I did, I simply wouldn't know where to start with making experimental changes. Most of the code which interfaces with SDL2 is boilerplate that I have copied from elsewhere, and that has been complicated by the need for the browser to control the 'main loop' rather than being self-contained, as it is in other builds. One specific question which you perhaps may be able to help with is how two different aspects of the way SDL2 interfaces to the display, are synchronized (if indeed they are). The first is the What is the 'official' way that WebGL rendering should be done in Emscripten given that these are both synchronized to display refresh? My call to This just 'seems to work' in most browsers but if the change to Emscripten has in some way disturbed the relative timing of the |
I am guessing, but don't know for sure, that each version of SDL2_ttf is paired with specific versions of FreeType and HarfBuzz, and attempting to mix them could well result in problems. I suppose the only way to find out would be to try it. |
Just a heads up, looks like the 2nd link here doesn't work anymore. |
Yes, sorry, I replaced it with the code kripken wanted me to run on Safari. Would you like me to put it back? |
I think it would be useful to have both versions alive for any future debugging, yes. |
It's back now, flashing away in Safari! |
@sbc100 @kripken I have encountered the same issue in a fairly large game that I'm working to port - but only when enabling PROXY_TO_PTHREAD=1 and OFFSCREENCANVAS_SUPPORT=1 (and only in Safari). When not enabling those features, there are no such issues in Safari (latest version, 17) using Emscripten 3.1.51. Did you happen to have any ideas as to why this might occur in Safari only with PROXY_TO_PTHREAD and OFFSCREENCANVAS_SUPPORT enabled? I'm happy to do a little more debugging if possible. |
@past-due is this a regression? (i.e. did this work for you in the past, using some older emscripten version?) |
Interestingly, I'm using neither of those flags yet encounter the same symptom. My flags are:
|
@past-due in order to confirm is this is the same issue is this one can you try building with 3.1.27? It would be interesting to know if building with that versions fixes your issue |
Well, I have some curious initial testing results. (Looks like some symptoms may be an overlap, but not clear if it's the same underlying cause.) To be clear:
* Safari TP is Release 187 (Safari 17.4, WebKit 19619.0.1.2) So in regards to PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT:
I've only tested the above on macOS, so I don't yet know if Firefox will reproduce that same added issue post-3.1.27 when running on other platforms. @sbc100 If you'd like me to open another Issue focusing on PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT I am happy to do so (this just initially seemed like it might be related to what was described here). |
Your issue seems to be separate one that I suspect is linked to the use offscreen canvas (which IIUC varies in support between browsers), and not a recent regression in emscripten. I think its probably worth opening a new issue. |
This SDL2-based Newton's Cradle program was compiled with the latest version of Emscripten (3.1.31). When running in Safari, only, it suffers from 'flashing' as if the display is periodically being blanked for one field period - at least it does on my 'Apple Silicon' Mac Mini.
I don't see this effect in Chrome-based browsers or Firefox (whether desktop or Android); it's a multithreaded app so won't run at all in iOS. Even on slow platforms which don't manage to achieve the full frame rate there is no evidence of this effect.
The obvious conclusion to draw is that it is a defect in Safari, but I'm reasonably sure it didn't happen when compiled with a much earlier version of Emscripten. It's a long shot I know, but does anybody have a clue what the cause might be?
The text was updated successfully, but these errors were encountered: