Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic flashing in Safari (only) #18677

Open
rtrussell opened this issue Feb 7, 2023 · 65 comments
Open

Periodic flashing in Safari (only) #18677

rtrussell opened this issue Feb 7, 2023 · 65 comments

Comments

@rtrussell
Copy link

This SDL2-based Newton's Cradle program was compiled with the latest version of Emscripten (3.1.31). When running in Safari, only, it suffers from 'flashing' as if the display is periodically being blanked for one field period - at least it does on my 'Apple Silicon' Mac Mini.

I don't see this effect in Chrome-based browsers or Firefox (whether desktop or Android); it's a multithreaded app so won't run at all in iOS. Even on slow platforms which don't manage to achieve the full frame rate there is no evidence of this effect.

The obvious conclusion to draw is that it is a defect in Safari, but I'm reasonably sure it didn't happen when compiled with a much earlier version of Emscripten. It's a long shot I know, but does anybody have a clue what the cause might be?

@kripken
Copy link
Member

kripken commented Feb 7, 2023

If no one has an idea, I'd recommend checking if there is a difference in an older version. If so then bisecting could give us an answer.

@rtrussell
Copy link
Author

If no one has an idea, I'd recommend checking if there is a difference in an older version. If so then bisecting could give us an answer.

Sadly bisecting is not practical. As I explained in an earlier thread, the code does not run at all in a large range of Emscripten versions (something like 2.0.13 to 3.1.20 inclusive) because support for multithreading in SDL2 was broken in those versions.

It's possible that the problem arises not from Emscripten but from SDL2, which has itself undergone many changes in the same period (I've always used the version you get from -s USE_SDL=2).

As I said, asking here was a long shot. Have any Safari-specific issues ever been identified?

@kripken
Copy link
Member

kripken commented Feb 7, 2023

(I can't think of any rendering-specific ones, sorry. But maybe someone else will remember something.)

@rtrussell
Copy link
Author

bisecting could give us an answer.

As it turns out that was a good suggestion, because the problem has seemingly been introduced quite recently and a bisection has allowed me to find when: Emscripten 3.1.27 does not exhibit the flashing in Safari whereas 3.1.28 does.

Also, fortunately, the bundled versions of SDL2, SDL2_ttf and SDL2_net did not change between these releases, so that presumably rules out that they are contributing to the issue.

So something changed between 3.1.27 and 3.1.28 which has resulted in the flashing in Safari (but not in any other browser I have tried, including other Mac browsers like Vivaldi).

If you want to try it yourself here are versions of the Newton's Cradle simulation built with 3.1.27 and 3.1.28. I am using Safari on an 'Apple Silicon' Mac Mini which shows the difference clearly,

@rtrussell
Copy link
Author

So something changed between 3.1.27 and 3.1.28 which has resulted in the flashing in Safari (but not in any other browser I have tried, including other Mac browsers like Vivaldi).

Is this regression going to receive some attention? Can I provide any other information to help?

@kripken
Copy link
Member

kripken commented Feb 9, 2023

I think the best way to make progress here is to bisect into that range, to find the specific commit and not just the release. Once we have that commit, the problem might be obvious.

@rtrussell
Copy link
Author

rtrussell commented Feb 9, 2023

I think the best way to make progress here is to bisect into that range, to find the specific commit

I didn't realise that was possible. Can I type emsdk install <something> to install a particular commit?

I hoped that as the fault is so specific (black frames being displayed at a regular rate, which I would estimate as a little faster than 1Hz) it would be easy to guess which change was responsible. What happens inside Emscripten (i.e. the Java or Web Assembly) at that rate anyway?

If you have a Mac, have you been able to repro the fault using the two builds I linked to?

@kripken
Copy link
Member

kripken commented Feb 10, 2023

Yes, you can install builds to bisect. It does require large downloads in each step, but the number of steps is logarithmic so it's not that bad in practice. Here is the link again: https://emscripten.org/docs/contributing/developers_guide.html#bisecting

I really can't think of anything in Emscripten that could cause such a problem, so I don't really have any guess at this point... But maybe something in rendering code, or event loop timing..?

(I do not have a mac, sorry.)

@rtrussell
Copy link
Author

Here is the link again: https://emscripten.org/docs/contributing/developers_guide.html#bisecting

Thanks, but the instructions there are not very clear. For example "If instead you only know Emscripten version numbers, use emscripten-releases-tags.json to find the hashes" , that means nothing to me at all! What is emscripten-releases-tags.json and how is it used? I am a retired hardware engineer, and only dabble in programming (in BASIC and a smattering of C).

I really can't think of anything in Emscripten that could cause such a problem, so I don't really have any guess at this point... But maybe something in rendering code, or event loop timing..?

Please remember that I am compiling code which runs faultlessly in Windows, MacOS, Linux (both 32-bit and 64-bit), Android and iOS. It has been run on a wide range of platforms, from very slow (e.g. a Raspberry Pi 3) to very fast. On none of those platforms has it ever exhibited anything similar to the 'flashing' I see in Safari with Emscripten 3.1.28 and later.

@kripken
Copy link
Member

kripken commented Feb 10, 2023

What is emscripten-releases-tags.json and how is it used?

Here is that file emscripten-releases-tags.json:

https://github.com/emscripten-core/emsdk/blob/main/emscripten-releases-tags.json

In it, if you look for 3.1.30 for example you'll see

"3.1.30": "dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1",

That means that the build hash of release 3.1.30 is dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1. Then you can do ./emsdk install dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1 to install it (you can also do ./emsdk install 3.1.30 directly, but using hashes you can install and use even intermediate builds, which is necessary for bisecting between releases).

Please remember that I am compiling code which runs faultlessly in Windows, MacOS, Linux (both 32-bit and 64-bit), Android and iOS. It has been run on a wide range of platforms, from very slow (e.g. a Raspberry Pi 3) to very fast. On none of those platforms has it ever exhibited anything similar to the 'flashing' I see in Safari with Emscripten 3.1.28 and later.

Understood, yeah, the web platform can be weird in some ways. The rendering loop ties into the JS event loop for example. Could be something there perhaps.

@rtrussell
Copy link
Author

That means that the build hash of release 3.1.30 is dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1. Then you can do ./emsdk install dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1 to install it (you can also do ./emsdk install 3.1.30 directly, but using hashes you can install and use even intermediate builds, which is necessary for bisecting between releases).

OK, I will try a finer-grained bisection, but it may take a little while because it's quite a slow process to install/activate the new commit, do a clean build, upload to a web server and then test, even though it is only log2 of the total!

I have 'downgraded' to 3.1.27 for my release version so that my end-users are not inconvenienced by the bug. I know I could look at the release notes to see what has changed in 3.1.31, but is there anything specific I should worry about in doing that?

@kripken
Copy link
Member

kripken commented Feb 10, 2023

I can't think of a specific reason to worry about downgrading, unless you are using the experimental pthreads + dynamic linking support, which has seen some critical bugfixes recently.

@rtrussell
Copy link
Author

OK, I will try a finer-grained bisection,

Sorry, I just can't figure it out. I don't have (or understand) git, so I can't use git bisect as suggested at the Developer's Guide. I was hoping to find a document which lists all the commit hashes between 3.1.27 and 3.1.28 which I could manually bisect, but I've failed to find such a document. Is there one?

@kripken
Copy link
Member

kripken commented Feb 15, 2023

The releases repo has the commit hashes for every build, including releases,

https://chromium.googlesource.com/emscripten-releases

For example, from before we mentioned

"3.1.30": "dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1",

Then that commit dc1fdcfd3f5b9d29cb1ebdf15e6e845bef9b0cc1 will appear in that repo. You can use git log to find the range between two releases there.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 15, 2023

Since you don't have git here are relevant commits:

3.1.27 is 48ce0b44015d0182fc8c27aa9fbc0a4474b55982 (https://github.com/emscripten-core/emsdk/blob/main/emscripten-releases-tags.json#L2)
3.1.28 is 30b9e46ddcea66e91530559379089002d8b692cf (https://github.com/emscripten-core/emsdk/blob/6305e91dee5ac2c38af6e8b77011bdf75b50d32f/emscripten-releases-tags.json#L19)

$ git log --pretty=oneline 48ce0b44015d0182fc8c27aa9fbc0a4474b55982..30b9e46ddcea66e91530559379089002d8b692cf
30b9e46ddcea66e91530559379089002d8b692cf 3.1.28 LTO RC
19871a9ea4914d63749b8d4d170e27a8854cb565 Roll emscripten from 3e1b8387691d to f11d6196dd4e (4 revisions)
f74cfa2a3296861a9980b7219f48ed84ea735e69 Roll binaryen from c79548b93a81 to 2cb5cefb6392 (1 revision)
652a3d50f8577cbce8bfa38d86d510f9d08ded65 Roll llvm-project from 19158eb7f06d to ea4be70cea85 (24 revisions)
cf45103f8016a93ac77860d6aea3b0d6187d8549 Roll llvm-project from 5ac633d2de93 to 19158eb7f06d (41 revisions)
f8fa83c0372c2d3f00b758bc7cba5b71844816d9 Roll llvm-project from e5369823bc06 to 5ac633d2de93 (35 revisions)
6e3441d6ad6e4046fc53051b5b77adc70cdfba9d Roll llvm-project from 83b3304dd2a3 to e5369823bc06 (362 revisions)
bb5d857833a1af71fad624b56e312fb93f83ecc4 Roll emscripten from 677a8d097a64 to 3e1b8387691d (9 revisions)
794313e56861552efea30d2e3ae6bc70148f5c2d Roll binaryen from e2add4b09e39 to c79548b93a81 (8 revisions)
c1ade0e9de2092b4bf3a2ee5038d4cab8b06985f Add `ninja` directory to PATH
9ceedfc724ab421aa74b1021056a555328a24a2a Explicitly set CMAKE_MAKE_PROGRAM to point to locally-installed ninja
e46a6259297002060c9f1d5125d596e5eaedf12a Roll binaryen from 145e8f9ece5a to e2add4b09e39 (1 revision)
87bd01aac0daa900f742ad61ffc5efc0204adee3 Remove unused LLVM and Binaryen tools
a7fe7397a07f681cbf4b8dc3ea0a0dfc354e37ac Roll llvm-project from 2402c46b7155 to 83b3304dd2a3 (30 revisions)
8447970ff029f22dfe33c53f389b94c4df15f385 Roll llvm-project from f7a46aa8fbc2 to 2402c46b7155 (20 revisions)
893383d917fce38c437ecc09e3b1093bdbe902df Roll v8 from a044d7325466 to da86ca8ccd8d (48 revisions)
1d4d6bef5d795f07496662ecddca9ce061f1dfe4 Roll llvm-project from 2c7827da4f5b to f7a46aa8fbc2 (32 revisions)
f5473da091e9640871123a571ad60623c16e5863 Roll emscripten from 7c9b97ade051 to 677a8d097a64 (1 revision)
ae37c6a835efa49f4885960706aedc1850353c08 Roll llvm-project from b90ebbc5bd6d to 2c7827da4f5b (38 revisions)
5b5b2a7035c35d41ada91847111b6ddfdecd94bf Specify the ninja path explicitly
839db40665e5de4aaff55a45c47e163817f7ef08 Increase the stack size of LLVM testsuite builds
de39f8e96f761e5d637ed5399c4f21ffea8e2398 Roll binaryen from 69f6235c36eb to 145e8f9ece5a (1 revision)
4e4ab25a9683a6947b3a858a2272c119518488b2 Roll emscripten from c93b3a109583 to 7c9b97ade051 (1 revision)
a8f234fa6b92cccc5f2cf90f632000efeda01042 Roll llvm-project from 798fa4b415ee to b90ebbc5bd6d (65 revisions)
66db713decb3d7b31b1616f6e724c499a877e21b Roll llvm-project from a73db7f1ad7b to 798fa4b415ee (22 revisions)
8f087c46c3b1a237af050244362f2d0c4cc3708b Roll llvm-project from 5fa43db46ef9 to a73db7f1ad7b (27 revisions)
408ffd5d6301d79b7743953618c546505c0cc3e0 Roll llvm-project from 4f33339fb4d9 to 5fa43db46ef9 (21 revisions)
530a165d9b3e3f241529bd2d01c7761fdd79b2f6 Roll v8 from 8f886969838a to a044d7325466 (2 revisions)
752043f9f8cc7d05e2d2e058088291bab566f955 Roll llvm-project from 7a194cfb327a to 4f33339fb4d9 (13 revisions)
0b248f30d163a90c61a3cb585bee30d37b4541f2 Roll llvm-project from 8be0d8fb83aa to 7a194cfb327a (9 revisions)
210a63b96ee4fad339a44a43acdc00f32c879538 Roll llvm-project from 63b63c3dcd54 to 8be0d8fb83aa (4 revisions)
c9bc396b50a8bce1382a910f7d2a2108b992358c Roll llvm-project from 1d650d4f1dd3 to 63b63c3dcd54 (4 revisions)
db3eefae32bca18d2e35ce4d823c655991cff87c Roll llvm-project from 16a72a0f8748 to 1d650d4f1dd3 (42 revisions)
09329ed8ba2a7c25694c7c804de1974952a65ced Roll v8 from be25ee73979a to 8f886969838a (4 revisions)
5d48fee3da0ffb9197a50c57c5c3193ed5eb75ee Roll llvm-project from 5ceaeed65962 to 16a72a0f8748 (6 revisions)
7b0b034cdab1177871d64f6ea91ef56602b931fe Roll llvm-project from b6a01caa64aa to 5ceaeed65962 (11 revisions)
80118106fc47873358b5731108f8a4d95a54567c Roll llvm-project from fccab9f90b03 to b6a01caa64aa (17 revisions)
7667d62651090a4dd721fc2a1e427c28ef5df3b8 Roll emscripten from ae2f8d8ab7bc to c93b3a109583 (4 revisions)
5230f7cef51d76cbd7fe26f67ea34991a73663ef Roll binaryen from 4c79085f1bc6 to 69f6235c36eb (2 revisions)
e2e035f8896f23266b83e665549d4bee0f179480 Roll llvm-project from 0b2473936d36 to fccab9f90b03 (56 revisions)
3311d61089de55c1fb8682f629cc361c424b4167 Roll emscripten from 7c3be8e4361b to ae2f8d8ab7bc (1 revision)
db6948742ff457f46f501f8a4fe5aced60fb713b Update Binaryen CMake variable name
7fdbda10257edad93d2466f96ef35dc19e282655 Roll emscripten from dd96490f2112 to 7c3be8e4361b (2 revisions)
24c6e06328dd1c1f305cca7ed54868b18d925c33 Roll llvm-project from 683b9fc7bd23 to 0b2473936d36 (24 revisions)
6b108b4ad8f0cd57d03e458df69fe8eb50625bb6 Roll v8 from 90fe7dc9ce48 to be25ee73979a (22 revisions)
65ed76b603418db8a0eb38ee301f5b1a15d039db Roll binaryen from 197b6da7afe6 to e8b41f4a92b8 (2 revisions)
bdcaf3d5d5efa0b5ca6357b30599171996316e8a Roll emscripten from 6b1e844a76c4 to dd96490f2112 (1 revision)
25be4826744e15b5558c42d6ce3123eaa1b62aeb Roll llvm-project from 1522a3b7b34b to 683b9fc7bd23 (26 revisions)
07d8c899070e5209c12e393cdc7aade167d9d722 Roll llvm-project from d85e849ff4d5 to 1522a3b7b34b (36 revisions)
6cc40ae524952ab2ab62922d3762e38cb240e599 Roll emscripten from 2d75aa804d89 to 6b1e844a76c4 (2 revisions)
3d001d441dcb8f6d31a9a5fa65594aaf21fc7907 Roll llvm-project from 6d12599fd413 to d85e849ff4d5 (65 revisions)
2c1bc71f2ca992236054f2b4581148de75080278 Roll emscripten from 4681c920d0de to 2d75aa804d89 (3 revisions)
54323a2b83ee4f06d3f7a3f952fbad272a20d623 Roll binaryen from f70bc4d6634c to 197b6da7afe6 (1 revision)
4599a25f205089e99ca769b0bee45a750cd2d24a Roll binaryen from 73b048770937 to f70bc4d6634c (1 revision)
cdb02cb91268764fb4e4832e593785297a463a81 Roll emscripten from a197a6bd0dd3 to 4681c920d0de (1 revision)
ce1ae37ef6a46f06af66dfb443fae83fae167d76 Roll llvm-project from 2ab7c7e50a85 to 6d12599fd413 (29 revisions)
38daf11b8e88457dc50f55ce25ce2cb59bf2f2ce Roll llvm-project from f82f5b0507a2 to 2ab7c7e50a85 (33 revisions)
e86af751fe4133c0494256758fd49b3bd1e3c060 Roll v8 from 1b2cef85c79f to 90fe7dc9ce48 (29 revisions)
dff6def80eda488c4ba40ff70d932ecf3db1b0e8 Roll llvm-project from 942f3ee40866 to f82f5b0507a2 (16 revisions)
4b9dbe73a1a123507eb46274528f136680320ed7 Roll llvm-project from 0689edd68717 to 942f3ee40866 (26 revisions)
20827fbe911a1aaf5df4033cb24b2cba6aa46d90 Roll emscripten from 71ee39fd6310 to a197a6bd0dd3 (1 revision)
2262e37ff93e97a763b9afb7411aacf07ffe61d6 Roll binaryen from 4f6cb8e54aa0 to 73b048770937 (1 revision)
bf424779a73e39c11bbeedb76970a4ab66058793 Roll llvm-project from 3d9e226081cf to 0689edd68717 (37 revisions)
fe79edc0a0ec540168ffc41cfbc2dcc69e12443b Roll emscripten from 6a93f7db16cd to 71ee39fd6310 (1 revision)
1aea1ecb031d5cf4c4784c2c48cc0aaf39aff1e5 Roll binaryen from fe3ad0e6319c to 4f6cb8e54aa0 (3 revisions)
e131b3b9d023a909857cbfc885e12b2567fa519e Roll emscripten from 15bf099c5fed to 6a93f7db16cd (1 revision)
4ae44fb73a4f14c1428449329cb49a835316b01e Roll llvm-project from 0e6f0b7cc383 to 3d9e226081cf (44 revisions)
d560989afefb61cc976a54c4c9e2c03579c9f43f Roll llvm-project from afa60d08a2d4 to 0e6f0b7cc383 (28 revisions)
647408b33bf0061975e51ff78431d72b940c0cb7 Roll v8 from b49cea5506ad to 1b2cef85c79f (43 revisions)
fe67b123c858aa9942fb779ec74ca3abd605dc67 Roll llvm-project from 51679dc1c9d5 to afa60d08a2d4 (11 revisions)
67826f44f0ce93c6388931a511a1d96833729ca7 Roll emscripten from 0c10ba8ea3fa to 15bf099c5fed (4 revisions)
3cb2850d4fb361bf8facc522881467da3bfa1987 Roll llvm-project from ca856fff1c0f to 51679dc1c9d5 (33 revisions)
949e0243de74c216ac7e89e2a3604ddd586e5a77 Roll binaryen from 0c97bb81325a to fe3ad0e6319c (1 revision)
8ad7b2833969ba4d69b2b3f5760d3651abb7a427 Roll emscripten from afa75f342eef to 0c10ba8ea3fa (4 revisions)
f65ed23f9ab7e79d19e083d34d2909f61a90083c Roll llvm-project from dee009d3b5c7 to ca856fff1c0f (53 revisions)
34abebf36d7a0a091e2fe17cb58c6d6317e523b6 Roll binaryen from 83cceaca5c9c to 0c97bb81325a (3 revisions)
ca82b6d1b1514c3bd31f85d978f9c2d84232766e Roll binaryen from eeb838155dbf to 83cceaca5c9c (3 revisions)

@sbc100
Copy link
Collaborator

sbc100 commented Feb 15, 2023

You should be able to do emsdk install <revision> for any of those revisions and find out which one is to blame for your regression. Since there are 77 revisions a bisect would take roughly ~6 installs to narrow it own.

@rtrussell
Copy link
Author

Since there are 77 revisions a bisect would take roughly ~6 installs to narrow it own.

Thank you so much for the list, it will make it much easier.

@rtrussell
Copy link
Author

The 'Periodic flashing in Safari' regression appears to have been introduced between these two commits:

25be4826744e15b5558c42d6ce3123eaa1b62aeb  No flashing
bdcaf3d5d5efa0b5ca6357b30599171996316e8a  Flashing

I hope that may allow somebody to pin down the cause. If you have a suitable Mac to test it on (I am using an 'Apple Silicon' Mac Mini) here are direct links to demonstrate the issue:

25be4826744e15b5558c42d6ce3123eaa1b62aeb
bdcaf3d5d5efa0b5ca6357b30599171996316e8a

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2023

So the emscripten-releases roll that caused the issue was https://chromium-review.googlesource.com/c/emscripten-releases/+/4075232

The goods news is that change corresponds just a single emscripten rev #18267. I guess it makes sense because change changes relates the how time is reported across different threads.

@rtrussell
Copy link
Author

The good news is that change corresponds just a single emscripten rev #18267. I guess it makes sense because change changes relates the how time is reported across different threads.

If I'm understanding the comments there, it does seem that this change was responsible for some other odd, and seemingly unrelated, issues, including something Safari-specific. What's the next step? I can relatively easily test any proposed fix so long as I can use emsdk install.

@kripken
Copy link
Member

kripken commented Feb 16, 2023

It would be good to verify that reverting the change makes the problem goes away for you @rtrussell , that is, that if you are on latest emscripten, and manually revert that change, that things work properly. It's not a big diff so hopefully that can be done.

Separately another thing that would be good to do is to hear from @RReverser who landed that change, and might have an idea of what could be going wrong.

@RReverser
Copy link
Collaborator

RReverser commented Feb 16, 2023

Separately another thing that would be good to do is to hear from @RReverser who landed that change, and might have an idea of what could be going wrong.

Uh, sorry, I don't :/

I mean, if anything, that implementation only affected the origin for the time shift, not the accuracy of timers themselves, and, if anything, should've made the origin more accurate than before. I don't see how it could introduce flashing or anything like that, but then, knowing Safari's track record of weird bugs, I guess I shouldn't be very surprised if that's the case...

Would be good to double-check that it's indeed the culprit though.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2023

I guess in worst case, if safari didn't implement it correct this could cause the different threads to have different notions of the current time? @rtrussell, does that make any sense to you? Do you think that this kind of time skew could cause this kind of issue?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2023

@rtrussell can you explain a little more about what your application is doing? Are you doing rendering on a background thread? Are you using something like OFFSCREENCANVAS_SUPPORT or OFFSCREEN_FRAMEBUFFER?

@rtrussell
Copy link
Author

can you explain a little more about what your application is doing?

There's a limit to what I can tell you because of the role that SDL2 is playing, the internals of which I know virtually nothing about. But salient points are:

  • Although it's a multi-threaded application all the rendering is done from the 'main' thread, this is mandated by SDL2 and WebGL. I can't see how a time skew between threads would affect rendering.
  • The rendering uses custom, but relatively simple, shader code; I can list that if it's helpful.
  • Presenting each frame is initiated by a call to SDL_GL_SwapWindow().

@rtrussell
Copy link
Author

Should this regression be fixed in the latest release (3.1.32)? I'd like to take advantage of the update to SDL_ttf (and the forthcoming updates to freetype and harfbuzz) but of course I can't whilst it doesn't run properly in Safari.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 24, 2023

No, we still don't understand what is causing this issue. We could potentially revert #18267, but it would be nice to understand what is causing the issue and come up with real fix and a test to prevent future regressions.

@RReverser
Copy link
Collaborator

Since #18267 is just a JS change, it should be possible to download it via https://patch-diff.githubusercontent.com/raw/emscripten-core/emscripten/pull/18267.patch and apply a reverse patch (patch -R ...) on a local Emscripten installation to see if that fixes the issue at least.

@rtrussell
Copy link
Author

Let me know if there is anything further I can do, bearing in mind that I would need extremely detailed instructions because I'm not an experienced programmer (I'm a retired hardware engineer in my 70s).

The two versions that I ended up with after the bisection are still available if you have a suitable Mac for testing. I don't even know for sure that the issue affects all Macs, there might be something special about my 'Apple Silicon' Mac Mini which makes it particularly susceptible:

25be4826744e15b5558c42d6ce3123eaa1b62aeb
bdcaf3d5d5efa0b5ca6357b30599171996316e8a

@rtrussell
Copy link
Author

rtrussell commented Mar 2, 2023

manually edit the output JS to replace performance.timeOrigin with performance.timing.navigationStart

Bearing in mind that I have no idea what I'm doing, I attempted that (the JavaScript file isn't very easily editable, because there are no line breaks, but I tried it anyway). The result was an 'Exception thrown, see JavaScript console' error, and in the console it says "TypeError: undefined is not an object (evaluating 'performance.timing.navigationStart')".

What should I do now?

@kripken
Copy link
Member

kripken commented Mar 2, 2023

Thanks for trying. Interesting, that suggests that won't work either. The error indicates performance.timing does not exist (or maybe performance itself but that seems less likely).

Hopefully someone else reading here has some understanding of the situation on Safari and which API we should be using there.

@kripken
Copy link
Member

kripken commented Mar 2, 2023

Testing on Chrome on Linux, I see performance.timing does not exist on workers, which matches what you saw. So perhaps that API is just not meant to be useful on workers at all.

@rtrussell To test timeOrigin, can you try these instructions?

#include <emscripten.h>

int main() {
  EM_ASM({
    console.log("importScripts:", typeof importScripts);
    console.log("performance:", typeof performance);
    console.log("performance.timeOrigin:        ", typeof performance.timeOrigin, performance.timeOrigin, performance.timeOrigin, performance.timeOrigin);
    setTimeout(() => {
      console.log("delayed performance.timeOrigin:", typeof performance.timeOrigin, performance.timeOrigin, performance.timeOrigin, performance.timeOrigin);
    }, 1000);
  });
}

Compile that with emcc a.cpp -o a.html -pthread -sPROXY_TO_PTHREAD. Then run that with emrun a.html. In the page that opens, open the dev console. You should see logging similar to this but maybe with different numbers:

importScripts: function
performance: object
performance.timeOrigin:         number 1677792627641.84 1677792627641.84 1677792627641.84
delayed performance.timeOrigin: number 1677792627641.84 1677792627641.84 1677792627641.84

@rtrussell
Copy link
Author

rtrussell commented Mar 2, 2023

You should see logging similar to this but maybe with different numbers

This is what I see:

importScripts:  "function"
performance:  "object"
performance.timeOrigin:  "number"  1677798780963.3203 1677798780963.3203 1677798780963.3203

There is no delayed performance.timeOrigin shown.

Edit: I'm not running it with emrun, since I don't have Emscripten (or any development tools) on the Mac; I'm running it from my website.

@kripken
Copy link
Member

kripken commented Mar 3, 2023

Thanks for trying. Very strange the delayed one didn't show up. It should appear after a second. But the other info suggests timeOrigin works ok. So that doesn't seem like the issue.

Then my other theory was storing times somehow, and caring about their absolute values. But it's not likely to be some general usage of them like SDL_Timer or SDL code in general, as lots of projects use those code paths but this is the only bug report we've gotten. My best guess is something specific in your project that handles times.

@rtrussell
Copy link
Author

Very strange the delayed one didn't show up

It's related to not being run using emrun. Chrome doesn't report that either, if run from my website rather than locally.

My best guess is something specific in your project that handles times.

If by "my project" you mean something in SDL, possibly, but my code is not to blame. Although I use some time-related functions none of them can affect rendering, and anyway the same code runs perfectly on all the other SDL2 platforms (Windows, MacOS, Linux, Android, iOS) and on all other browsers I have tried.

There's no way that black frames can legitimately be rendered, which is what appears to be causing the 'flashing' in Safari. My rendering code is the usual pattern, in skeletal form:

do { // for each frame
      glClear(GL_COLOR_BUFFER_BIT || GL_DEPTH_BUFFER_BIT);
      // create the scene
      glDrawArrays(GL_TRIANGLES, 0, count);
      SDL_GL_SwapWindow(window,);
}:

So where do we go from here? There is a demonstrable regression which means I cannot build my project using recent versions of Emscripten, and whilst at the moment I can work around the issue by using an old version, that won't be acceptable once the updates to SDL2_ttf, FreeType and Harfbuzz get included in Emscripten.

To double-check that the change which you suspect is causing the problem really is, can you build a version of Emscripten that is identical to the latest release except for that change being reverted, and somehow make it available in a form I can use?

@kripken
Copy link
Member

kripken commented Mar 4, 2023

There isn't a simple way for me to build a release of emscripten and provide it for you, I'm afraid. Though perhaps someone else reading this may know more, as I know the emsdk has some functionality for using custom emscripten etc. from git (but I don't think it can package it up).

Overall this is definitely an odd problem. It seems to be specific to your project, but there's no obvious explanation now that we've ruled out the obvious possibilities. (Though I am still confused why you don't see the delayed logging - using emrun should not affect that at all, it's just a setTimeout in JS that runs on the page. Very strange. But, it is likely not related.)

One out-there idea might be to try to see if you can reproduce your problem on WebKit outside of Mac. There might be nightly WebKit builds, for example. If you can find one and the problem shows up there, I could debug with that perhaps.

Otherwise, our best options at this point might be to wait and hope for someone with a Mac and time to debug this to chime in here. We really need someone to actually understand the problem, as even if we did confirm that a revert helps you, that wouldn't indicate what the fix is - that PR fixed another problem, so we'd need a new fix for both at the same time.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2023

In case it helps, here are the instructions for using emsdk with a random git version of emscripten: https://github.com/emscripten-core/emsdk#how-do-i-use-my-own-emscripten-github-fork-with-the-sdk

@rtrussell
Copy link
Author

Otherwise, our best options at this point might be to wait and hope for someone with a Mac and time to debug this

How would you go about debugging it if you had a Mac? If reverting the change thought to be causing the problem does indeed fix it, what would be your next step? If there is no way the change can be permanently reverted, or achieved a different way, I'm not sure what options remain.

Re-reading the related PR the change seemed to cause all sorts of inexplicable side-effects in Safari and Firefox that were put down to 'flaky tests' but suggest to me that the change has a more far-reaching impact than it theoretically should have.

@RReverser
Copy link
Collaborator

Re-reading the related PR the change seemed to cause all sorts of inexplicable side-effects in Safari and Firefox that were put down to 'flaky tests' but suggest to me that the change has a more far-reaching impact than it theoretically should have.

If you're referring to my comments in the end and specifically #18307, then no, those were flaky tests we had on main branch as well at the time and not something introduced by the PR.

As @kripken said, if it's indeed that change that causes flashing in your case, most likely it's something in your code that depends on timing being absolute, otherwise we'd see more reports.

@rtrussell
Copy link
Author

As @kripken said, if it's indeed that change that causes flashing in your case, most likely it's something in your code that depends on timing being absolute, otherwise we'd see more reports.

Sorry to keep repeating myself, but if by "my code" you include SDL2 then I suppose it's possible, but that isn't something that is under my control and I doubt that the SDL developers would be motivated to put much effort into debugging it their end, especially as it's difficult to reproduce.

I'm certain it isn't something in the code which I've written myself, because (a) it's generic code which runs perfectly well on every other SDL2 platform and (b) the rendering isn't affected by absolute timing at all (SDL_GL_SwapWindow() waits for vSync, which is the only timing-related functionality required).

You seem to be saying that it's not going to be fixed in Emscripten, even though it has clearly been caused by a change in Emscripten. So since there's nothing I can do to fix it myself, that leaves me with no option but to tell my users that my app will not be compatible with Safari going forward. I don't consider that at all satisfactory.

@RReverser
Copy link
Collaborator

RReverser commented Mar 4, 2023

that leaves me with no option but to tell my users that my app will not be compatible with Safari going forward. I don't consider that at all satisfactory.

If only you knew how common it is these days as Safari tends to lag behind both in terms of features and bugfixes, especially since their release cycle is still tied to macOS releases :(

For now that seems to be the only of two safe options, another being just continue building with older Emscripten if everything works fine with it, until someone with macOS can debug this issue further.

By the way, is your code public anywhere? It might be helpful for us or someone who will debug this issue to be able to look at the source code and build flags used rather than just at the final binary.

@rtrussell
Copy link
Author

another being just continue building with older Emscripten if everything works fine with it

It doesn't. Every other platform on which my app runs (Windows, MacOS, Linux, Android, iOS) supports rendering complex scripts (Arabic, Thai etc.) using the features built into recent releases of SDL2_ttf (which incorporates FreeType and HarfBuzz). Emscripten is lagging behind all those platforms because the version of SDL2_ttf bundled with it is very out-of-date.

I understand that some work is underway to rectify this situation, and to bring Emscripten in line with the other platforms in respect of the version of SDL2_ttf. If and when that happens I will want to use that later version, which of course I am prevented from doing whilst this issue affects Safari.

One possible workaround would be for me to use the EMCC_LOCAL_PORTS feature to allow me to incorporate a more recent version of SDL2_ttf in what is otherwise the earlier Emscripten build. But it doesn't work, and when I enquired about it I was told that the feature was not intended for production.

By the way, is your code public anywhere?

Certainly, it's all here. My app is a popular (and famous, in some parts of the world!) BASIC interpreter, and the BASIC program which I have been using to illustrate the issue is cradle.bbc.

@RReverser
Copy link
Collaborator

RReverser commented Mar 4, 2023

Every other platform on which my app runs (Windows, MacOS, Linux, Android, iOS) supports rendering complex scripts (Arabic, Thai etc.) using the features built into recent releases of SDL2_ttf (which incorporates FreeType and HarfBuzz). Emscripten is lagging behind all those platforms because the version of SDL2_ttf bundled with it is very out-of-date.

For what it's worth, you technically don't have to use SDL via -s USE_... options. These days upstream SDL and SDL_ttf have Emscripten support built-in, so all those -s USE_SDL and -s USE_SDL_ttf options do is download and compile it for you as a convenience.

That is, nothing stops you from downloading latest version of SDL and SDL_ttf in your own Makefile, building them and linking with your app irrespective of the Emscripten version.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2023

As @kripken said, if it's indeed that change that causes flashing in your case, most likely it's something in your code that depends on timing being absolute, otherwise we'd see more reports.

Sorry to keep repeating myself, but if by "my code" you include SDL2 then I suppose it's possible, but that isn't something that is under my control and I doubt that the SDL developers would be motivated to put much effort into debugging it their end, especially as it's difficult to reproduce.

I'm certain it isn't something in the code which I've written myself, because (a) it's generic code which runs perfectly well on every other SDL2 platform and (b) the rendering isn't affected by absolute timing at all (SDL_GL_SwapWindow() waits for vSync, which is the only timing-related functionality required).

You seem to be saying that it's not going to be fixed in Emscripten, even though it has clearly been caused by a change in Emscripten. So since there's nothing I can do to fix it myself, that leaves me with no option but to tell my users that my app will not be compatible with Safari going forward. I don't consider that at all satisfactory.

@rtrussell I'm sorry you are having a frustrating time with this issue. I'm sure we can eventually track down and fix this issue but it might take a more time and effort. I tend to agree that simply reverting this change without getting to the bottom of the problem is not the most satisfactory solution here.

While I understand where you are coming from with claims like "I'm certain it isn't something in the code which I've written myself", my years of experience with porting C/C++ have taught me that there can always be latent portability issues in any codebase, no matter how portable and no matter how bug free on other platforms. Its not a criticism of you or your codebase to suggest this, we just want to get to the bottom of the issue.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2023

As for next steps that you could take, I can think of 3:

  1. Continue to try to reduce the failure case to something smaller, to aid other folks in debugging.

  2. Continue to debug yourself to try and find the root cause of the flashing.

  3. Create your own fork of emscripten and use that with emsdk as described in https://github.com/emscripten-core/emsdk#how-do-i-use-my-own-emscripten-github-fork-with-the-sdk. Doing that would have two outcomes. Firstly it would confirm 100% that that one change is that cause of your flashing. Secondly, it might allow you to continue to build against new versions of emscripten without being blocked on this issue.

@rtrussell
Copy link
Author

These days upstream SDL and SDL_ttf have Emscripten support built-in, so all those -s USE_SDL and -s USE_SDL_ttf options do is download and compile it for you as a convenience.

I had read that was true of SDL2 itself, but I wasn't sure it was also true of SDL2_ttf, not least because of its dependence on FreeType and HarfBuzz.

When I use -s USE_SDL_ttf=2 I notice two things that are probably significant: firstly the port it downloads is not SDL2_ttf but SDL2_ttf-mt which I assume is a special multi-threaded version. If I compile SDL2_ttf from source, will I get this multi-threaded version?

Secondly, it pulls in the FreeType and HarfBuzz ports too (and quite likely multi-threaded versions of those). Will that also happen automatically if I simply include the source tree of SDL2_ttf in my build?

@RReverser
Copy link
Collaborator

RReverser commented Mar 4, 2023

You can find code for everything it's doing in here: https://github.com/emscripten-core/emscripten/blob/667505546e57040aaf00a56c07c0b924158307e9/tools/ports/sdl2_ttf.py

Basically it pulls in regular SDL2_ttf from the official repo and builds it with -DTTF_USE_HARFBUZZ=1 -sUSE_SDL=2 -sUSE_FREETYPE -sUSE_HARFBUZZ. The -mt suffix is just a local identifier for version built with -pthread in addition to other flags above.

So if you only want to update SDL2_ttf, you can do that by manually building the latest archive with the same flags. If you want to update Freetype & Harfbuzz as well instead of using ones shipped with Emscripten, seems like that would be more complicated as for those the builder seems to have a lot more custom logic (see freetype.py and harfbuzz.py in the same folder).

@rtrussell
Copy link
Author

my years of experience with porting C/C++ have taught me that there can always be latent portability issues in any codebase, no matter how portable and no matter how bug free on other platform

You are probably right that there is some change I could make to my code which would workaround the problem.

My frustration stems from the near impossibility that I would ever be able to find it! I have mentioned before that I am suffering from 'cognitive decline' which has been tentatively diagnosed as Alzheimer's Disease. Much of the code in my project was written ten or more years ago, and the extent to which I still understand it is very limited.

Even if I did, I simply wouldn't know where to start with making experimental changes. Most of the code which interfaces with SDL2 is boilerplate that I have copied from elsewhere, and that has been complicated by the need for the browser to control the 'main loop' rather than being self-contained, as it is in other builds.

One specific question which you perhaps may be able to help with is how two different aspects of the way SDL2 interfaces to the display, are synchronized (if indeed they are). The first is the emscripten_set_main_loop() function which specifies the callback that the browser makes to my code once per frame. The second is the SDL_GL_SwapWindow() function which controls rendering in the WebGL context..

What is the 'official' way that WebGL rendering should be done in Emscripten given that these are both synchronized to display refresh? My call to SDL_GL_SwapWindow(), which happens in the browser callback, is presumably blocking, when waiting for vSync. Is the timing of the callback designed for this to happen?

This just 'seems to work' in most browsers but if the change to Emscripten has in some way disturbed the relative timing of the emscripten_set_main_loop() callback and when SDL_GL_SwapWindow() waits for vSync maybe it could be related to the problem.

@rtrussell
Copy link
Author

If you want to update Freetype & Harfbuzz as well instead of using ones shipped with Emscripten, seems like that would be more complicated

I am guessing, but don't know for sure, that each version of SDL2_ttf is paired with specific versions of FreeType and HarfBuzz, and attempting to mix them could well result in problems. I suppose the only way to find out would be to try it.

@RReverser
Copy link
Collaborator

RReverser commented Mar 4, 2023

The two versions that I ended up with after the bisection are still available if you have a suitable Mac for testing. I don't even know for sure that the issue affects all Macs, there might be something special about my 'Apple Silicon' Mac Mini which makes it particularly susceptible:

25be4826744e15b5558c42d6ce3123eaa1b62aeb
bdcaf3d5d5efa0b5ca6357b30599171996316e8a

Just a heads up, looks like the 2nd link here doesn't work anymore.

@rtrussell
Copy link
Author

Just a heads up, looks like the 2nd link here doesn't work anymore.

Yes, sorry, I replaced it with the code kripken wanted me to run on Safari. Would you like me to put it back?

@RReverser
Copy link
Collaborator

I think it would be useful to have both versions alive for any future debugging, yes.

@rtrussell
Copy link
Author

I think it would be useful to have both versions alive for any future debugging, yes.

It's back now, flashing away in Safari!

@past-due
Copy link
Contributor

@sbc100 @kripken I have encountered the same issue in a fairly large game that I'm working to port - but only when enabling PROXY_TO_PTHREAD=1 and OFFSCREENCANVAS_SUPPORT=1 (and only in Safari). When not enabling those features, there are no such issues in Safari (latest version, 17) using Emscripten 3.1.51.

Did you happen to have any ideas as to why this might occur in Safari only with PROXY_TO_PTHREAD and OFFSCREENCANVAS_SUPPORT enabled? I'm happy to do a little more debugging if possible.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 31, 2024

@past-due is this a regression? (i.e. did this work for you in the past, using some older emscripten version?)

@rtrussell
Copy link
Author

Did you happen to have any ideas as to why this might occur in Safari only with PROXY_TO_PTHREAD and OFFSCREENCANVAS_SUPPORT enabled? I'm happy to do a little more debugging if possible.

Interestingly, I'm using neither of those flags yet encounter the same symptom. My flags are:

-s USE_PTHREADS=1 -s USE_SDL=2 -s USE_SDL_TTF=2 -s USE_SDL_NET=2 -s MAXIMUM_MEMORY=1gb \
-s PTHREAD_POOL_SIZE=2 -s ALLOW_MEMORY_GROWTH=1 -s WASM=1 -s ASSERTIONS=0 -s STACK_SIZE=5MB 

@sbc100
Copy link
Collaborator

sbc100 commented Jan 31, 2024

@past-due in order to confirm is this is the same issue is this one can you try building with 3.1.27? It would be interesting to know if building with that versions fixes your issue

@past-due
Copy link
Contributor

@past-due in order to confirm is this is the same issue is this one can you try building with 3.1.27? It would be interesting to know if building with that versions fixes your issue

Well, I have some curious initial testing results. (Looks like some symptoms may be an overlap, but not clear if it's the same underlying cause.)

To be clear:

  • This is on an M1 Pro MBP running latest macOS.
  • I am testing an Emscripten port of Warzone 2100, which is a WIP - not the example mentioned above.
Emscripten Options Safari 17.3 Safari TP* Firefox 122 Chrome 121
3.1.27 PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT Flickering (moderate) Flickering (severe)1 Occasional flickering, but runs ok ✅ Seems OK
3.1.27 - ✅ Seems OK ✅ Seems OK ✅ Seems OK ✅ Seems OK
3.1.51 PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT Flickering (moderate) Flickering (severe)1 Occasional flickering, but will eventually trigger a full browser tab freeze2 ✅ Seems OK
3.1.51 - ✅ Seems OK ✅ Seems OK ✅ Seems OK ✅ Seems OK

* Safari TP is Release 187 (Safari 17.4, WebKit 19619.0.1.2)
1 Safari Technology Preview has horribly severe flickering issues in PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT builds, regardless of version of Emscripten used.
2 While the exact timing of the freeze differs, I am consistently able to reproduce a full browser freeze by interacting rapidly with the game in Firefox - for example by holding a mouse button and dragging the camera around. This freeze does not seem to be reproducible in Firefox with 3.1.27, so something that changed since 3.1.27 seems to be impacting Firefox.

So in regards to PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT:

  • At least in my tests, the Safari flickering seems to be an issue that exists when using both Emscripten 3.1.27 and 3.1.51. (But for the game I'm trying to build, I do not seem to see this issue without using those two options on Safari 17.)
  • The Safari flickering issue is dramatically worse when using the current Safari Technology Preview build Release 187 (Safari 17.4, WebKit 19619.0.1.2)
  • However something seems to have changed post-3.1.27, because Firefox (surprisingly) has an added issue: the ability to consistently freeze the browser tab entirely.

I've only tested the above on macOS, so I don't yet know if Firefox will reproduce that same added issue post-3.1.27 when running on other platforms.

@sbc100 If you'd like me to open another Issue focusing on PROXY_TO_PTHREAD && OFFSCREENCANVAS_SUPPORT I am happy to do so (this just initially seemed like it might be related to what was described here).

@sbc100
Copy link
Collaborator

sbc100 commented Jan 31, 2024

Your issue seems to be separate one that I suspect is linked to the use offscreen canvas (which IIUC varies in support between browsers), and not a recent regression in emscripten. I think its probably worth opening a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants