Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu-dawn: reduce Dawn build and iteration times #124

Closed
emidoots opened this issue Dec 10, 2021 · 15 comments
Closed

gpu-dawn: reduce Dawn build and iteration times #124

emidoots opened this issue Dec 10, 2021 · 15 comments
Milestone

Comments

@emidoots
Copy link
Member

emidoots commented Dec 10, 2021

Update: Before I opened this issue, build and iteration times were quite slow. We've made good progress, keeping this issue open for further improvements. Results so far:

macOS M1 (original chipset) w/16GB RAM:

zig build action Before After Improvement
From scratch 3m14s 2m38s 18%
No changes 19.4s 1.3s 93%
One file changed 23.7s 7.9s 67%
libgpu.a size ? 41M ?
dawn-example size ? 17M ?
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@emidoots
Copy link
Member Author

After the changes above, we see some great gains:

zig build action Before After Zig version OS CPU Memory
From scratch 3m14s 2m38s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB
No changes 19.4s 1.3s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB
One file changed 23.7s 7.9s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB

Additionally I tracked down where time is spent:

  • ~8s total on average to rebuild+link
  • ~6.7s of that spent linking the executable, the rest spent compiling+etc
  • ~2.1s of that link time spent in parseObjectsIntoAtoms
  • ~2.6s of that link time spent in calcAdhocSignature

After discussion with Jakub (zld author) it seems highly likely link time/perf can be improved esp. in those two functions above.

emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit that referenced this issue Dec 12, 2021
See #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
andrewrk pushed a commit to ziglang/zig that referenced this issue Dec 14, 2021
While investigating slow build times with [a large project](hexops/mach#124),
I found that the compiler was reading from disk nearly every C source file in my project
when rebuilding despite no changes having been made. This accounted for several seconds of
time (approx. 20-30% of running `zig build` without any changes to the sources.)

The cause of this was that comparisons of file mtimes would _always_ fail (the mtime of the file on
disk was always newer than that stored in the cache manifest), and so the cache logic would always
fall back to byte-for-byte file content comparisons with what is on disk vs. in the cache-reading every
C source file in my project from disk during each rebuild. Because file contents were the same, a cache
hit occurred, and _despite the mtime being different the cache manifest would not be updated._

One can reproduce this by building a Zig project so the cache is populated, and then changing mtimes
of their C source files to be newer than what is in the cache (without altering file contents.)

The fix is rather simple: we should always write the updated cache manifest regardless of
whether or not a cache hit occurred (a cache hit doesn't indicate if a manifest is dirty) Luckily,
`writeManifest` already contains logic to determine if a manifest is dirty and becomes no-op if no
change to the manifest file is necessary-so we merely need to ensure it is invoked.

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@emidoots
Copy link
Member Author

Comparing build times on a really beefy Linux gaming laptop laptop (i7-10875H, 8 core 5.1ghz w/32GB RAM) vs. macOS M1 (original chipset, 16GB RAM):

zig build action Linux time M1 macOS time
From scratch 5m17s 2m38s
No changes 0.5s 1.3s
One file changed 1.7s 7.9s

Really interesting numbers here:

Why is building on Linux from scratch so much slower? Maybe opengl+vulkan backend really takes that much longer to compile than metal backend?

Why are iteration times on macOS so much worse? Guessing zld perf just isn't there yet, but that gives us an idea of how much room for improvement we can expect in zld in the future I guess!

@emidoots
Copy link
Member Author

Investigating from-scratch build times, on M1 macOS:

84s make 'spirv-tools'
48s make 'tint'
18s make 'dawn-native'
13s make 'abseil-cpp'
12s make 'dawn-wire'
 2s make 'glfw'
 2s make 'dawn-common'
 1s make 'gpu'
 1s make 'dawn-utils'
 0.8s make 'dawn-native-mach'
 0.7s make 'dawn-platform'

Collected by swapping the end of lib/zig/std/build.zig:makeOneStep with:

+        var timer = try std.time.Timer.start();
         try s.make();
+        std.debug.print("{: >10}ms make '{s}'\n", .{timer.read()/std.time.ns_per_ms, s.name});

@emidoots
Copy link
Member Author

We could eliminate SPIRV support on macOS (would require contributing a change upstream to Dawn) which, from experiments, would reduce build times on macOS from 2m38s -> 1m57s (26% reduction.)

We could also eliminate it on Windows if we only target DirectX. Linux would require it as both Vulkan and OpenGL backends require it.

@meshula
Copy link

meshula commented Dec 20, 2021

It's been pointed out in the dawn issues that abseil's use is trivial (just an overpowered string format), but an outsized part of the build for its contribution.

The push back was that abseil has fast hashing and containers that maybe could be used in the future.

In the mean time, the formatting routines rope in nearly the entire abseil library, spiraling down into time zones, language localizations, and so much more.

Might be worth dropping more notes on the issue, highlighting that abseil is one of the larger time consuming build components.

https://bugs.chromium.org/p/dawn/issues/detail?id=1148&q=abseil&can=2

@emidoots
Copy link
Member Author

@meshula thanks, and will do!

Even if we eliminated abseil entirely, though, I still think build times here are kinda unacceptable.

Almost all of the build time seems to come from spirv-tools, tint, DirectXShaderCompiler, and spirv-cross. I want to dig more into why these are so slow to compile, but I fear the real answer is just shader compilation + translation requires 2-4 different compilers":

  • Tint seems to handle WGSL -> [MSL, SPIRV, GLSL*, HLSL]
  • Dawn then seems to pass HLSL -> DirextXShaderCompiler (a full fork of LLVM, wow) to do HLSL -> SPIRV.
  • Tint's GLSL backend isn't sufficient yet, so they use spirv-cross for WGSL -> GLSL (which I think then may go through another SPIRV layer before actually being passed to the GPU?)

I suspect that this amount of indirections is one of the reasons Dawn is likely to be so mature / less buggy, but also quite heavy.

@emidoots
Copy link
Member Author

Exploring whether or not we can reduce the amount of spirv-tools code that gets pulled in:

  • Tint's SPIRV reader depends on the spirv-tools optimizer, so you can't eliminate the optimizer without eliminating the SPIRV reader. But, SPIRV reader shouldn't be necessary in any Dawn target I think (can be eliminated)
  • You can eliminate the spirv-tools disassembler, but it doesn't count for much of the compile time at all.

@emidoots
Copy link
Member Author

Building spirv-tools goes from 84s -> 30s if we eliminate Tint's SPIRV reader and the spirv-tools optimizer, nice!

@emidoots
Copy link
Member Author

Building spirv-tools goes from 30s->6s if we eliminate the dependency on the SPIRV validator (easy for macOS, probably doable on others)

@meshula
Copy link

meshula commented Dec 22, 2021

My current approach is native-as-possible; so Metal/DX/Vk as appropriate - the sad thing is Vk is most useful to me on the slowest platform, rpi, where I am stubbornly building natively rather than cross compiling from desktop. That said, I love the speed gains implied for Metal/DX platforms at least. Maybe the thing to do there is to have an explicit but optional cross-platform build for rpi, so that eating the Khronos tool chain build-pain is an optional choice?

@emidoots
Copy link
Member Author

@meshula I am actually thinking we can have a build config option which is maybe on-by-default and fetches/uses prebuilt binaries for the target. You'd be able to toggle it off with the flip of a switch and get the build from source using just Zig, though (that's how the binaries would be produced)

Thoughts on that?

Also see #133 for another idea I have going on.

@meshula
Copy link

meshula commented Dec 22, 2021

Cached binaries makes a ton of sense for first time, and iteration purposes. A locally reproducible build can be used for air-gapped systems that can't pull binaries from the internet for whatever reason, and security audits.

@emidoots emidoots changed the title gpu: reduce Dawn build and iteration times gpu-dawn: reduce Dawn build and iteration times Dec 23, 2021
@emidoots
Copy link
Member Author

emidoots commented Dec 27, 2021

I looked into why gfx-rs/wgpu may be faster to compile than Dawn, some key differences I noticed:

  • Windows: gfx-rs/wgpu is using the deprecated(?) FXC compiler on Windows (which does not support Shader Model 6.0, and is forbidden in Windows Store apps according to Microsoft's official docs). This spares them from needing to use the newer dxcompiler API, which is a full fork of LLVM and not shipped with Windows. Dawn must build this from source.
  • macOS: Dawn exposes some functionality to consume SPIRV shaders in addition to WGSL, that adds dependency on spirv-tools, spirv-cross, etc. that is not otherwise needed. gfx-rs/wgpu just supports WGSL.
  • Linux: gfx-rs/wgpu does not support desktop OpenGL, and so it does not have to compile any support for that. Both support a direct WGSL->SPIRV translation, however. There may be other differences leading to compile time diff. on Linux.

@emidoots emidoots added this to the Mach 0.1 milestone Feb 20, 2022
@emidoots
Copy link
Member Author

emidoots commented Feb 27, 2022

Great news, Dawn no longer requires spirv-cross for OpenGL backends. This should speed up Linux compilation significantly, and reduce binary sizes a bit! hexops-graveyard/dawn@a52abab

emidoots added a commit that referenced this issue Feb 28, 2022
Dawn no longer uses spirv-cross for OpenGL backends:

hexops-graveyard/dawn@a52abab

Hence, we no longer need to compile it.

Helps #124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@emidoots
Copy link
Member Author

There's an upcoming demo about this, but mach/gpu-dawn now builds binary releases for every target and the build.zig makes using them a magical experience so that by default you get pretty instant binary downloads, and just add -Ddawn-from-source=true to build Dawn 100% from source using the Zig compiler.

emidoots added a commit to hexops-graveyard/mach-gpu that referenced this issue Sep 11, 2022
See hexops/mach#124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit to hexops-graveyard/mach-gpu that referenced this issue Sep 11, 2022
See hexops/mach#124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit to hexops-graveyard/mach-gpu that referenced this issue Sep 11, 2022
See hexops/mach#124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit to hexops-graveyard/mach-gpu that referenced this issue Sep 11, 2022
See hexops/mach#124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
emidoots added a commit to hexops-graveyard/mach-gpu that referenced this issue Sep 11, 2022
See hexops/mach#124

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants