Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm32: some examples get RuntimeError #22

Closed
kassane opened this issue May 28, 2024 · 16 comments
Closed

wasm32: some examples get RuntimeError #22

kassane opened this issue May 28, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@kassane
Copy link
Owner

kassane commented May 28, 2024

shaded examples not working in wasm32 target. Black screen!

e.g.:

Cube

firefox - output console
onerror: RuntimeError: unreachable executed cube.html:11:356
Uncaught RuntimeError: unreachable executed cube.wasm:5450:1
    tick http://localhost:6931/cube.js:1860
    _emscripten_request_animation_frame_loop http://localhost:6931/cube.js:1864
    _main http://localhost:6931/cube.js:3345
    callMain http://localhost:6931/cube.js:3391
    doRun http://localhost:6931/cube.js:3421
    run http://localhost:6931/cube.js:3430
    setTimeout handler*run http://localhost:6931/cube.js:3426
    runCaller http://localhost:6931/cube.js:3382
    removeRunDependency http://localhost:6931/cube.js:340
    receiveInstance http://localhost:6931/cube.js:589
    receiveInstantiationResult http://localhost:6931/cube.js:594
    instantiateAsync http://localhost:6931/cube.js:562
    promise callback*instantiateAsync/</< http://localhost:6931/cube.js:560
    promise callback*instantiateAsync/< http://localhost:6931/cube.js:559
    promise callback*instantiateAsync http://localhost:6931/cube.js:556
    createWasm http://localhost:6931/cube.js:605
    <anonymous> http://localhost:6931/cube.js:3341
Error: Promised response from onMessage listener went out of scope ExtensionMessagingService.js:89:34

Triangle

chromium - output console
triangle.html:11 onerror: Uncaught RuntimeError: unreachable
triangle.wasm:0x141c Uncaught RuntimeError: unreachable
    at triangle.wasm:0x141c
    at triangle.wasm:0x16b9
    at triangle.wasm:0x19f6
    at triangle.wasm:0xb40b
    at triangle.wasm:0xa084
    at triangle.wasm:0x62a3
    at tick (triangle.js:1860:30)
$func103 @ triangle.wasm:0x141c
$func111 @ triangle.wasm:0x16b9
$func121 @ triangle.wasm:0x19f6
$func206 @ triangle.wasm:0xb40b
$func190 @ triangle.wasm:0xa084
$func151 @ triangle.wasm:0x62a3
tick @ triangle.js:1860
requestAnimationFrame (async)
_emscripten_request_animation_frame_loop @ triangle.js:1864
$func144 @ triangle.wasm:0x4b06
$func143 @ triangle.wasm:0x46bb
$func108 @ triangle.wasm:0x169e
$func124 @ triangle.wasm:0x2962
$main @ triangle.wasm:0x2980
Module._main @ triangle.js:3130
callMain @ triangle.js:3176
doRun @ triangle.js:3206
(anonymous) @ triangle.js:3215
setTimeout (async)
run @ triangle.js:3211
runCaller @ triangle.js:3167
removeRunDependency @ triangle.js:340
receiveInstance @ triangle.js:589
receiveInstantiationResult @ triangle.js:594
(anonymous) @ triangle.js:562
Promise.then (async)
(anonymous) @ triangle.js:560
Promise.then (async)
(anonymous) @ triangle.js:559
Promise.then (async)
instantiateAsync @ triangle.js:556
createWasm @ triangle.js:605
(anonymous) @ triangle.js:3126
@kassane kassane added the bug Something isn't working label May 28, 2024
@kassane
Copy link
Owner Author

kassane commented Sep 19, 2024

Testing again after removing -release flag (ba2a1f0).
Why? This flag remove bounds checks from arrays in @system code | more info here.

cube(relfast/small)
[sg][error][id:5][line:8473] 
cube.js:63:183
[sg][info][id:5][line:8474] 
	ERROR: 1:1: '' : syntax error
cube.js:63:243
[sg][error][id:5][line:8473] 
cube.js:63:183
[sg][info][id:5][line:8474] 
	ERROR: 1:1: '' : syntax error
cube.js:63:243
onerror: RuntimeError: index out of bounds cube.html:1:1053
Uncaught RuntimeError: index out of bounds
    c http://localhost:6931/cube.js:40
    Qa http://localhost:6931/cube.js:40
    _main http://localhost:6931/cube.js:65
    a http://localhost:6931/cube.js:67
    Ec http://localhost:6931/cube.js:67
    setTimeout handler*Ec http://localhost:6931/cube.js:67
    Dc http://localhost:6931/cube.js:66
    a http://localhost:6931/cube.js:64
    <anonymous> http://localhost:6931/cube.js:64
    promise callback*Ba/< http://localhost:6931/cube.js:6
    promise callback*Ba http://localhost:6931/cube.js:6
    Z http://localhost:6931/cube.js:64
    <anonymous> http://localhost:6931/cube.js:64
cube.wasm:33396:1
WebGL warning: bufferData: target: Invalid enum value 0 (Did you typo `gl.SOMETHINGG` and pass `undefined`?)
Error: Promised response from onMessage listener went out of scope ExtensionMessagingService.js:89:34
cube (relsafe)
[sg][error][id:105] /home/kassane/sokol-d/src/sokol/c/sokol_gfx.h:16197:0: 
	VALIDATE_BUFFERDESC_DATA: immutable buffers must be initialized with data (sg_buffer_desc.data.ptr and sg_buffer_desc.data.size)
cube.js:64:160
[sg][panic][id:296] /home/kassane/sokol-d/src/sokol/c/sokol_gfx.h:16168:0: 
	VALIDATION_FAILED: validation layer checks failed
ABORTING because of [panic]
cube.js:64:130
Aborted() cube.html:1:906
onerror: RuntimeError: Aborted(). Build with -sASSERTIONS for more info. cube.html:1:1053
Uncaught RuntimeError: Aborted(). Build with -sASSERTIONS for more info.
    ua http://localhost:6931/cube.js:4
    Xa http://localhost:6931/cube.js:40
    c http://localhost:6931/cube.js:41
    Qa http://localhost:6931/cube.js:41
    _main http://localhost:6931/cube.js:66
    a http://localhost:6931/cube.js:68
    Gc http://localhost:6931/cube.js:68
    setTimeout handler*Gc http://localhost:6931/cube.js:68
    Fc http://localhost:6931/cube.js:67
    a http://localhost:6931/cube.js:65
    <anonymous> http://localhost:6931/cube.js:65
    promise callback*Ba/< http://localhost:6931/cube.js:6
    promise callback*Ba http://localhost:6931/cube.js:6
    Z http://localhost:6931/cube.js:65
    <anonymous> http://localhost:6931/cube.js:65
cube.js:4:330
Error: Promised response from onMessage listener went out of scope ExtensionMessagingService.js:89:34
triangle | sql_context | blend
onerror: RuntimeError: unreachable executed triangle.html:1:1053
Uncaught RuntimeError: unreachable executed
    c http://localhost:6931/triangle.js:38
    Na http://localhost:6931/triangle.js:38
    _main http://localhost:6931/triangle.js:54
    a http://localhost:6931/triangle.js:56
    uc http://localhost:6931/triangle.js:56
    setTimeout handler*uc http://localhost:6931/triangle.js:56
    tc http://localhost:6931/triangle.js:55
    a http://localhost:6931/triangle.js:53
    <anonymous> http://localhost:6931/triangle.js:54
    promise callback*za/< http://localhost:6931/triangle.js:6
    promise callback*za http://localhost:6931/triangle.js:6
    Z http://localhost:6931/triangle.js:54
    <anonymous> http://localhost:6931/triangle.js:54
triangle.wasm:39269:1
mrt (relsafe)
Aborted(Assertion failed: (slot_index > (0)) && (slot_index < p->attachments_pool.size), at: /home/kassane/sokol-d/src/sokol/c/sokol_gfx.h,16039,_sg_attachments_at) mrt.html:1:906
onerror: RuntimeError: Aborted(Assertion failed: (slot_index > (0)) && (slot_index < p->attachments_pool.size), at: /home/kassane/sokol-d/src/sokol/c/sokol_gfx.h,16039,_sg_attachments_at). Build with -sASSERTIONS for more info. mrt.html:1:1053
Uncaught RuntimeError: Aborted(Assertion failed: (slot_index > (0)) && (slot_index < p->attachments_pool.size), at: /home/kassane/sokol-d/src/sokol/c/sokol_gfx.h,16039,_sg_attachments_at). Build with -sASSERTIONS for more info.
    va http://localhost:6931/mrt.js:4
    a http://localhost:6931/mrt.js:41
    c http://localhost:6931/mrt.js:42
    Za http://localhost:6931/mrt.js:42
    _main http://localhost:6931/mrt.js:70
    a http://localhost:6931/mrt.js:73
    Kc http://localhost:6931/mrt.js:73
    setTimeout handler*Kc http://localhost:6931/mrt.js:73
    Jc http://localhost:6931/mrt.js:72
    a http://localhost:6931/mrt.js:69
    <anonymous> http://localhost:6931/mrt.js:70
    promise callback*Ba/< http://localhost:6931/mrt.js:6
    promise callback*Ba http://localhost:6931/mrt.js:6
    Z http://localhost:6931/mrt.js:70
    <anonymous> http://localhost:6931/mrt.js:70
mrt.js:4:333

cc: @floooh

@floooh
Copy link
Collaborator

floooh commented Sep 20, 2024

Does this only happen in the web version?

In sokol-zig I had to increase the Emscripten stack size recently, at one point, the Emscripten SDK reduced the default stack size to 64 KBytes, which really isn't much, especially with the 'stack heavy' init code in the sokol samples.

floooh/sokol-zig#82 (comment)

...maybe it's the same issue - in general, "anything can happen" on such a stack overflow, so that would be the first thing I would try.

@kassane
Copy link
Owner Author

kassane commented Sep 20, 2024

Does this only happen in the web version?

Yeah!

In sokol-zig I had to increase the Emscripten stack size recently

applied: ae22c87

@floooh
Copy link
Collaborator

floooh commented Sep 20, 2024

...and did it fix the issue? :)

@kassane
Copy link
Owner Author

kassane commented Sep 20, 2024

...and did it fix the issue? :)

The error still persists after commit.
The outputs mentioned above already have commit applied.

@floooh
Copy link
Collaborator

floooh commented Sep 21, 2024

All those different errors look like there's a problem with passing data via pointers into the resource creation functions, most likely that the desc structs in the resource creation functions get corrupted on their way from the D side to the C side. Maybe D's automatic memory management gets in the way, or the desc structs on the stack are invalidated before their function is called?

Doesn't quite explain why it only happens in WASM though... unless it's some sort of ABI incompatibility that only happens on the WASM compilation target...

PS: e.g. if the only difference is the added bounds checking I would expect to see more D bounds checking errors, but the errors are all over the place instead (shader compilation doesn't work, nullptrs are provided where a pointer to data is expected etc...

One way to approach the bug would be printf-debugging, e.g. logging content of desc structs that go into the creation functions both on the D side (for instance here:

return sg_make_buffer(&desc);
), and then what arrives down in the C function (e.g. here:
SOKOL_ASSERT(_sg.valid);
)

@kassane
Copy link
Owner Author

kassane commented Sep 21, 2024

libsokol + cube - wasm32 llvm-ir (relfast):
https://gist.github.com/kassane/bbae501db913ccceaa9b600b1e0930b7

@kassane
Copy link
Owner Author

kassane commented Sep 21, 2024

LLVM-IR (sg_make_buffer):

cube.zig cube.d
; Function Attrs: noredzone nosanitize_coverage nounwind skipprofile uwtable
define dso_local void @init() #0 !dbg !632 {
; [...]
%7 = call i32 @sg_make_buffer(ptr nonnull readonly align 4 %0), !dbg !887, !noalias !888
%8 = call i32 @sg_make_buffer(ptr nonnull readonly align 4 %0), !dbg !895, !noalias !896
; [...]
; Function Attrs: noredzone nounwind uwtable
declare i32 @sg_make_buffer(ptr readonly align 4) local_unnamed_addr #7
; [#uses = 1]
define void @init() #2 {
; [...]
  call void @sg_make_buffer(ptr noalias nonnull sret(%sokol.gfx.Buffer) align 4 %.sret_tmp, ptr nonnull %vbufd) #2
  call void @sg_make_buffer(ptr noalias nonnull sret(%sokol.gfx.Buffer) align 4 %.sret_tmp1, ptr nonnull %ibufd) #2

; [...]
; [#uses = 0]
define void @_D5sokol3gfx10makeBufferFNbNiNeMKSQBgQBd10BufferDescZSQCaQBx6Buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4 %.sret_arg, ptr dereferenceable(56) %desc) local_unnamed_addr #2 {
  tail call void @sg_make_buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4 %.sret_arg, ptr nonnull %desc) #2
  ret void
}

; [#uses = 3]
declare void @sg_make_buffer(ptr noalias sret(%sokol.gfx.Buffer) align 4, ptr) local_unnamed_addr #2

@floooh
Copy link
Collaborator

floooh commented Sep 22, 2024

Very weird, I'm now running into very similar problems in my bindings cleanup branch when trying to build the 'raw' Emscripten samples (specifically the bufferoffsets-emsc sample).

Screenshot 2024-09-22 at 11 44 28 samples in the sokol-samples repo... for instance in release mode:

...those warnings also look like it's using the WebGL1 shim, not the WebGL2 Emscripten shim.

...and in debug mode, a pointer is not null which should be null, resulting in this error:

Screenshot 2024-09-22 at 11 46 51

...it's not the stack size (unless the stack size argument started failing for some reason)...

Investigating...

PS: ...might be related to my current WIP stuff in the branch, switching back to the master branch it works as expected.

Still very weird though... why would you suddenly get similar errors like me when it's something I messed up in a branch...?

@floooh
Copy link
Collaborator

floooh commented Sep 22, 2024

Crazy enough it seems to be related to WebGL vs WebGL2 shaders...

In the mrt-emsc sample (in sokol-samples) I can trigger the problem by replacing the vertex or fragment shader in offscreen_shd with a WebGL2 shader (e.g. #version 300 es). Switching back to WebGL1 shaders and it runs.

This is very, very weird. Maybe something corrupts the stack in the WebGL2 shader case which then causes those strange problems in the followup calls.

I'll try to investigate as far as possible today, but will be on a 4-day vacation trip until Thursday where I'll be offline.

PS: same in that other broken sample (bufferoffsets-emsc), going back to WebGL1 shaders makes it work.

PPS: so out of those 16 Emscripten samples in https://github.com/floooh/sokol-samples/tree/master/html5, 2 have that weird behaviour that they crash with mysterious errors when I replace the WebGL1 shaders with WebGL2 shaders (bufferoffsets-emsc and offscreen-emsc). I will next try to figger out what's different about those two samples. At first glance they don't seem to share anything significant.

@floooh
Copy link
Collaborator

floooh commented Sep 22, 2024

Wrote a reminder ticket here:

floooh/sokol#1114

...my theory that the stack corruption somehow happens inside sokol_gfx.h and then affects followup calls also can't be right, because in the bufferoffsets-emsc sample, there's only a single call to sg_make_shader() which is already broken.

I'll try to find out more after I return from my little vacation trip.

@floooh
Copy link
Collaborator

floooh commented Sep 22, 2024

...ok, so in my case the reason was a little innocent comma in the shader code string. It's still interesting how that could cause such absolutely weird errors.

And it also doesn't explain the sokol-d problems because the sokol-d samples use code-generated shaders. But since the errors look similar, maybe the problems are still somehow related...?

@kassane
Copy link
Owner Author

kassane commented Sep 22, 2024

All those different errors look like there's a problem with passing data via pointers into the resource creation functions, most likely that the desc structs in the resource creation functions get corrupted on their way from the D side to the C side. Maybe D's automatic memory management gets in the way, or the desc structs on the stack are invalidated before their function is called?

DGC (BoehmGC ported to D) is useless unless the druntime library is involved. Especially when betterC mode is enabled. Because it uses addroot/addrange together with malloc/free (yes, DGC handles malloc/free itself). However, if it explicitly binds malloc/free or another custom allocator, gc can't handle it.

Druntime does not have wasm support!

explicit malloc/free for C++ class in D: https://d.godbolt.org/z/n34nY61b8

Doesn't quite explain why it only happens in WASM though... unless it's some sort of ABI incompatibility that only happens on the WASM compilation target...

According to the issue opened on ldc-developers (this issue is cited). It's more likely that the struct alignment causes this error (still debated in the D community).

However, we can't say that D doesn't fully support wasm32, because the other examples work. Maybe I'm still trying to figure out how to work around it manually by tweaking the implementation of these examples and shaders.

I tried some tests with emcc-debug and also by removing the "-preview" features flag from D (requires refactoring), enabling sanitize-c from zig and got the same result. My question is at what point does the problem occur?

In this project, I haven't tried auto-generating C++ code from D yet (it's possible - it requires customization and the addition of extern(C++)). Zig will recompile the examples in C++ (if they work, maybe the problem is in ldc2/ldmd2 wasm-codegen).

e.g.: D to C++ (works on dmd & ldc2/ldmd2, gdc no): https://godbolt.org/z/q6cxs4d38 - (-HCf=filename.cpp)

@kassane
Copy link
Owner Author

kassane commented Sep 22, 2024

In this project, I haven't tried auto-generating C++ code from D yet [...]

https://godbolt.org/z/1KqGcWGjq (generated test) - see my comment-lines in full-code

@kassane
Copy link
Owner Author

kassane commented Sep 24, 2024

Fixed by:

Edit: updated latest ldc2 (nightly)
image

@kassane kassane closed this as completed Sep 24, 2024
@floooh
Copy link
Collaborator

floooh commented Sep 27, 2024

Awesome. Good to see that sokol-gfx somehow and indirectly contributes to improving the D toolchains ;)

If alignment was the issue then I guess that the reason why some demos worked while others didn't was that the alignment might have been accidentially right in some of the demos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants