Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zig]: fix use of dead stack variables #31

Merged
merged 1 commit into from
Oct 29, 2022
Merged

[zig]: fix use of dead stack variables #31

merged 1 commit into from
Oct 29, 2022

Conversation

andrewrk
Copy link
Contributor

Howdy! Thanks for making this nifty project. I wanted to use it to take a data point to find out machine code quality of Zig's old compiler vs the new compiler. So I tried compiling it without -fstage1 but ran into ziglang/zig#13340. I worked around that with a small diff to Sprite.Flags inside this PR which perhaps you will find acceptable (alternately you could just wait until that issue is fixed and then that part of this PR could be dropped).

Next, I hit a segfault, and so I ran the binary in Valgrind, and there were a lot of errors like this:

==938067== Invalid read of size 32
==938067==    at 0x2F31F9: memcpy (in /home/andy/Downloads/rosettaboy/zig/rosettaboy-2)
==938067==    by 0x2526D2: ram.RAM.get (ram.zig:207)
==938067==    by 0x258DDD: cpu.CPU.tick_interrupts (cpu.zig:333)
==938067==    by 0x25894E: cpu.CPU.tick (cpu.zig:227)
==938067==    by 0x25D9C3: gameboy.GameBoy.tick (gameboy.zig:44)
==938067==    by 0x25DAD4: gameboy.GameBoy.run (gameboy.zig:39)
==938067==    by 0x25DCFE: main.main (main.zig:25)
==938067==    by 0x25E33F: callMain (start.zig:578)
==938067==    by 0x25E33F: initEventLoopAndCallMain (start.zig:512)
==938067==    by 0x25E33F: callMainWithArgs (start.zig:462)
==938067==    by 0x25E33F: main (start.zig:477)
==938067==  Address 0x1ffef9c0a7 is on thread 1's stack
==938067==  131945 bytes below stack pointer

So that lead to reworking GameBoy.new into GameBoy.init to avoid keeping pointers to dead stack variables. It's too bad that code like what you had originally doesn't work; that pattern is the motivation behind this proposal: ziglang/zig#2765

Who knows, maybe we'll get that pattern working someday. But as it stands, that code is not legal because dead stack variables are being accessed. Related: ziglang/zig#3180 - this will make debugging such issues more deterministic.

After fixing that issue, Valgrind got much quieter, only showing two things left:

==938494== Invalid read of size 8
==938494==    at 0x25903F: sdl.Renderer.setColor (sdl.zig:634)
==938494==    by 0x25BE96: gpu.GPU.tick (gpu.zig:160)
==938494==    by 0x25D9EC: gameboy.GameBoy.tick (gameboy.zig:37)
==938494==    by 0x25DAC4: gameboy.GameBoy.run (gameboy.zig:31)
==938494==    by 0x25DD35: main.main (main.zig:26)
==938494==    by 0x25E36F: callMain (start.zig:578)
==938494==    by 0x25E36F: initEventLoopAndCallMain (start.zig:512)
==938494==    by 0x25E36F: callMainWithArgs (start.zig:462)
==938494==    by 0x25E36F: main (start.zig:477)
==938494==  Address 0x1ffefdc2a8 is on thread 1's stack
==938494==  66712 bytes below stack pointer
==938494== 
==938494== Invalid read of size 8
==938494==    at 0x2590BC: sdl.Renderer.clear (sdl.zig:558)
==938494==    by 0x25BF01: gpu.GPU.tick (gpu.zig:161)
==938494==    by 0x25D9EC: gameboy.GameBoy.tick (gameboy.zig:37)
==938494==    by 0x25DAC4: gameboy.GameBoy.run (gameboy.zig:31)
==938494==    by 0x25DD35: main.main (main.zig:26)
==938494==    by 0x25E36F: callMain (start.zig:578)
==938494==    by 0x25E36F: initEventLoopAndCallMain (start.zig:512)
==938494==    by 0x25E36F: callMainWithArgs (start.zig:462)
==938494==    by 0x25E36F: main (start.zig:477)
==938494==  Address 0x1ffefdc2a8 is on thread 1's stack
==938494==  66712 bytes below stack pointer

So I peeked into the SDL sdk.zig file and noticed that SDL.Renderer is a struct that wraps a pointer, so no need to take a double pointer. This prevents hanging onto a pointer to the wrapping struct when what is really desired is a copy of the pointer.

After fixing that, Valgrind went silent and the binary started working. Then I was able to collect the data point that I wanted:

[nix-shell:~/Downloads/rosettaboy/zig]$ hyperfine "./rosettaboy-1 --profile 600 --silent --headless --turbo opus5.gb" "./rosettaboy-2 --profile 600 --silent --headless --turbo opus5.gb"
Benchmark 1: ./rosettaboy-1 --profile 600 --silent --headless --turbo opus5.gb
  Time (mean ± σ):     321.9 ms ±   5.0 ms    [User: 319.1 ms, System: 1.3 ms]
  Range (min … max):   316.9 ms … 334.1 ms    10 runs
 
Benchmark 2: ./rosettaboy-2 --profile 600 --silent --headless --turbo opus5.gb
  Time (mean ± σ):     298.1 ms ±   1.0 ms    [User: 295.2 ms, System: 2.0 ms]
  Range (min … max):   296.5 ms … 300.0 ms    10 runs
 
Summary
  './rosettaboy-2 --profile 600 --silent --headless --turbo opus5.gb' ran
    1.08 ± 0.02 times faster than './rosettaboy-1 --profile 600 --silent --headless --turbo opus5.gb'

Sweet! Looks like the new compiler generates better LLVM IR than the old one 🙂

* GameBoy.new had to be reworked into GameBoy.init to avoid keeping
  pointers to dead stack variables.

* I peeked into the SDL sdk.zig file and noticed that SDL.Renderer is a
  struct that wraps a pointer, so no need to take a double pointer. This
  prevents hanging onto a pointer to the wrapping struct when what is
  really desired is a copy of the pointer.

* The changes to Sprite are not really needed but were used to work
  around ziglang/zig#13340 so that RosettaBoy
  can be built with and without `-fstage1`.
@shish
Copy link
Owner

shish commented Oct 29, 2022

That looks excellent, thank you! This is my first Zig project, trying to teach myself the language and see how it compares to other languages that I'm more familiar with, so definitely happy to take suggestions from someone who knows what they're doing :)

@shish shish merged commit 6a27dc8 into shish:master Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants