Compute example #22

samoylovfp · 2023-05-26T14:01:22Z

Trying to fix Bevy-Rust-GPU/bevy-rust-gpu#20

Reading the sources of

And trying to mash everything together

taken from https://github.com/EmbarkStudios/rust-gpu/blob/795f433fdd1d01974bd48a5d810d85214b6606b4/examples/shaders/compute-shader/src/lib.rs

Shfty · 2023-06-02T17:03:23Z

Thanks for putting in the effort on this - I've not had chance to test it yet, but if it runs and produces the expected output then I'd say it's a strong start!

samoylovfp · 2023-06-02T17:05:20Z

Not yet. I need to read more about how bevy handles shaders and didn't have the energy to do so yet. Planning on taking another stab this or the next weekend

samoylovfp · 2023-06-11T12:34:56Z

I hope that I am missing something obvious, but the ways of achieving the result, that I see, feel a bit backwards.

The current problem is that the bevy compute pipeline requires the bevy shader handle,
which in the bevy compute example is obtained by loading the shader from a path via an AssetServer.
For rust-gpu shader we first need to convert it into a bevy shader to build the compute pipeline.

I initially thought it will be easiest to just do this processing in the render graph node "update" method,
but seems like the render-sub-app's World does not provide the Assets<Shader> or Assets<RustGpuBuilderOutput>,
so I cannot just take the shader builder output, convert it and put into the Assets<Shader>. Maybe it is possible by converting them in the bevy app and then "extracting" them into the render world, but it seems more backwards than other solutions.

I think maybe at this point the most straightforward way is to provide a custom AssetLoader, that builds a bevy Shader out of the RustGpuBuilderOutput,
then the bevy compute example requires little change,
only the addition of the assetloader and the name of the shader,
maybe using the "hashpound fragment" syntax to indicate the entrypoint; I'll try that.

samoylovfp · 2023-06-11T15:27:33Z

I ended up in a situation where I think it should work, but it doesn't and it doesn't even complain that something is wrong.
I suspect I might have messed up the signature of the shader entrypoint, will try to debug it somehow

samoylovfp · 2023-06-12T06:09:30Z

I'll take another stab in two weeks unless someone figures it out sooner

tombh · 2023-06-14T23:23:29Z

Sounds like great progress, I'm excited to try it out!

samoylovfp · 2023-07-06T08:57:39Z

Wasn't able to debug why nothing is showing, lost the remainder of my motivation reading through the SPIR-V specification. Might take another stab in a few months

tombh · 2023-07-06T12:54:03Z

Ha yeah SPIR-V is pretty esoteric. Awesome work @samoylovfp 🙇

johnny-smitherson · 2023-11-06T17:49:56Z

hey i got this to work with some extensive hacking and vendoring, also updated to bevy 0.11

the problem with this PR is that the shader accepts storage

    #[spirv(storage, descriptor_set = 0, binding = 0)] texture:  &[Vec4],

but the bevy app binds storage texture - solution was to change the shader to accept Image!(2D, format=...) instead.

    #[spirv(descriptor_set = 0, binding = 0)] texture:   &Image!(2D, format=rgba8_snorm, sampled=false),

Here is a branch where it renders simplex noise at 11FPS for 1280x720 (about 10x slower than single-threaded CPU)

#23

tombh · 2023-11-06T17:57:36Z

That's great! ~~Did you mean to post a link to the branch? Or were just mentioning it?~~ I see the link now.

Do you think the notably slower frame rate is because of Bevy's rust-gpu integration? Or just because your implementation is prioritising proof of concept for now?

johnny-smitherson · 2023-11-06T18:06:37Z

That old noise crate was the only no-std crate that worked - check rust-gpu/shader/lib/noise.

To see if it's really slow or not, we'd have to translate it and compare with WGSL on same GPU - otherwise the comparison with CPU is meaningless (who knows what intrinsics magically show up on the CPU side?)

Anyway, it's a starting point for doing your own computation and benchmarks.

johnny-smitherson · 2023-11-06T18:08:17Z

If the maintainer is still interested i can make separate PRs for each of the vendored codebases - otherwise, it's really a bother to work with many little crates spread around, when I need to upgrade dependencies in each and every one

johnny-smitherson · 2023-11-06T18:17:10Z

The timings of GPU vs. CPU for the noise crate:

CPU cargo run Elapsed: 270ms
CPU cargo run --release 22ms
GPU nvidia - 90ms (both with and without --release when building shader crate)

I think the noise crate is a little bit too much compute for each thread. I'm sure other tasks are better suited for this - through I'd first translate some WGSL/GLSL compute benchmarks into Rust first and see if there are major losses in SPIR-V/SPIR-T/whatever

EDIT Actually only 4x slower than CPU - think it's working correctly

johnny-smitherson · 2023-11-06T19:15:01Z

here is game of life in 80 lines, running at 4k in 60fps.

ported this: https://github.com/bevyengine/bevy/blob/v0.12.0/assets/shaders/game_of_life.wgsl

setting "NO VSYNC" in bevy doesn't actually let the game go over 60fps - there's probably some way we can trace into the compute shader runtime and get the compute shader runtime from there?

#![no_std]
#![feature(asm_experimental_arch)]

use spirv_std::{
    spirv,
    glam::{UVec3, IVec2, Vec4}, Image,
};

fn hash(value: u32) -> u32 {
    let mut state = value;
    state = state ^ 2747636419;
    state = state * 2654435769;
    state = state ^ state >> 16;
    state = state * 2654435769;
    state = state ^ state >> 16;
    state = state * 2654435769;
    return state;
}

fn randomFloat(value: u32) -> f32 {
    return (hash(value) as f32) / 4294967295.0;
}

pub type Image_2D_SNORM =  Image!(2D, format=rgba8_snorm, sampled=false);

fn is_alive(location: IVec2, offset_x: i32, offset_y: i32, image: &Image_2D_SNORM) -> i32 {
    let value= image.read(location + IVec2::new(offset_x, offset_y));
    return value.x as i32;
}

fn count_alive(location: IVec2, image: &Image_2D_SNORM) -> i32 {
    return is_alive(location, -1, -1, image) +
           is_alive(location, -1,  0, image) +
           is_alive(location, -1,  1, image) +
           is_alive(location,  0, -1, image) +
           is_alive(location,  0,  1, image) +
           is_alive(location,  1, -1, image) +
           is_alive(location,  1,  0, image) +
           is_alive(location,  1,  1, image);
}



#[spirv(compute(threads(8,8)))]
pub fn init(
    #[spirv(global_invocation_id)] id: UVec3,
    #[spirv(num_workgroups)] num: UVec3,
    #[spirv(descriptor_set = 0, binding = 0)] texture: &Image_2D_SNORM,
) {

    let coord = IVec2::new(id.x as i32, id.y as i32);
    let randomNumber = randomFloat(id.y * num.x + id.x);
    let alive = randomNumber > 0.9;
    let alive_f = alive as i32 as f32;
    let pixel = Vec4::new(alive_f, alive_f, alive_f, 1.0);
    unsafe {
        texture.write(coord, pixel);
    }
}

#[spirv(compute(threads(8,8)))]
pub fn update(
    #[spirv(global_invocation_id)] id: UVec3,
    #[spirv(num_workgroups)] num: UVec3,
    #[spirv(descriptor_set = 0, binding = 0)] texture:   &Image!(2D, format=rgba8_snorm, sampled=false),
){

    let coord = IVec2::new(id.x as i32, id.y as i32);
    let n_alive = count_alive(coord, texture);
    let alive = n_alive == 3 || n_alive == 2 && is_alive(coord, 0, 0, texture) == 1;
    let alive_f = alive as i32 as f32;
    let pixel = Vec4::new(alive_f, alive_f, alive_f, 1.0);

    unsafe { spirv_std::arch::workgroup_memory_barrier_with_group_sync() };

    unsafe {
        texture.write(coord, pixel);
    }

}

johnny-smitherson · 2023-11-06T19:20:03Z

I've also thrown in some of my game logic (ballistic solution for 1st order viscosity) and it works exactly as expected, with speeds comparable to CPU rust (5x slower) and no value error

I can't wait not to learn WGSL

thanks for the code!

johnny-smitherson · 2023-11-06T22:34:35Z

Also I've set up a docker build process on the fork, so you don't have to install the nightly rust from 6 months ago on the host.

Another note is that for compute shaders, the entry_points.json stuff in the bevy app / rust gpu builder doesn't need to exist, so the rust-gpu-builder probably doesn't have to pull bevy and bevy-gpu-builder-shared that pulls in bevy reflection, since we don't want any interop with internal types.

So it might be easier to use rust-gpu direcly, without any bevy-specific code for generating compute shaders - provided we still have to set up all the bind groups by hand.

Maybe after the entry_points.json stuff is updated to work with bevy 0.11/0.12 we can look at automatically generating the layouts and bindings code for the bevy side (similar to what's being done for materials)

tombh · 2023-11-11T22:29:57Z

That's a lot of great info and insight. Even if its slower, it's just great to know that it all works. I'm still a newbie to all this, so it'll take me a while to pore over everything.

samoylovfp added 3 commits May 23, 2023 18:04

Update Cargo.lock to point to an existing commit

092a023

Add prime computation example

5794a0f

taken from https://github.com/EmbarkStudios/rust-gpu/blob/795f433fdd1d01974bd48a5d810d85214b6606b4/examples/shaders/compute-shader/src/lib.rs

Add a frankenstein's monster of bevy and rust-gpu examples together

f42e28a

samoylovfp added 4 commits June 11, 2023 17:35

Reset "compute.rs" to bevy compute example

9aa3c9a

cargo fmt

cef4191

Use rust-gpu built spirv for shader

11f6fb3

Update shader to match buffer format

1ddc4fb

Fix invocation indices

b426e77

johnny-smitherson mentioned this pull request Nov 7, 2023

documentation and examples for porting WGSL/GLSL shaders EmbarkStudios/rust-gpu#1096

Open

rust-gpu-bot mentioned this pull request Nov 13, 2024

[Migrated] documentation and examples for porting WGSL/GLSL shaders Rust-GPU/rust-gpu#70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute example #22

Compute example #22

samoylovfp commented May 26, 2023 •

edited

Loading

Shfty commented Jun 2, 2023

samoylovfp commented Jun 2, 2023

samoylovfp commented Jun 11, 2023

samoylovfp commented Jun 11, 2023

samoylovfp commented Jun 12, 2023

tombh commented Jun 14, 2023

samoylovfp commented Jul 6, 2023

tombh commented Jul 6, 2023

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

tombh commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023

tombh commented Nov 11, 2023

Compute example #22

Are you sure you want to change the base?

Compute example #22

Conversation

samoylovfp commented May 26, 2023 • edited Loading

Shfty commented Jun 2, 2023

samoylovfp commented Jun 2, 2023

samoylovfp commented Jun 11, 2023

samoylovfp commented Jun 11, 2023

samoylovfp commented Jun 12, 2023

tombh commented Jun 14, 2023

samoylovfp commented Jul 6, 2023

tombh commented Jul 6, 2023

johnny-smitherson commented Nov 6, 2023 • edited Loading

tombh commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023

johnny-smitherson commented Nov 6, 2023 • edited Loading

johnny-smitherson commented Nov 6, 2023 • edited Loading

johnny-smitherson commented Nov 6, 2023 • edited Loading

johnny-smitherson commented Nov 6, 2023

tombh commented Nov 11, 2023

samoylovfp commented May 26, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading

johnny-smitherson commented Nov 6, 2023 •

edited

Loading