Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

Closed
vicary opened this issue Sep 20, 2022 · 17 comments · Fixed by #6918
Closed

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

vicary opened this issue Sep 20, 2022 · 17 comments · Fixed by #6918
Assignees
Labels
type:bug Something isn't working

Comments

@vicary
Copy link

vicary commented Sep 20, 2022

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monterey 12.6
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow.js installed from (npm or script link): https://cdn.skypack.dev/@tensorflow/tfjs
  • TensorFlow.js version (use command below): 3.20.0
  • Browser version: Deno (built from source at commit 2929ec9f)
  • Tensorflow.js Converter Version: N/A

Describe the current behavior

I came from #6746 and made denoland/deno#15853 to have deno compatible with the WebGPU backend. Model predictions sometimes resulting in an all-zeros tensor, the problem does not exist with a CPU backend.

Describe the expected behavior

Output tensors should contain non-zero numbers.

Standalone code to reproduce the issue

import * as tf from "https://cdn.skypack.dev/@tensorflow/tfjs?dts";
import "https://cdn.skypack.dev/@tensorflow/tfjs-backend-webgpu";

await tf.ready();
const model = tf.sequential();
model.add(tf.layers.dense({ units: 1, inputShape: [1] });
const output = await model.predict(tf.tensor([1])).array();

console.log(output); // prints [[0]]

Other info / logs
I am on an Apple M1 laptop.

Backstory and tracking issues:

  1. Deno uses deno_webgpu from gfx-rs/wgpu, which in turn uses gfx-rs/naga for all shader related things.
  2. WGSL const-declarations gfx-rs/wgpu#2994
  3. [wgsl-in] allow functions to be declared literally after entry point gfx-rs/naga#2072
  4. Global item does not support 'const' gfx-rs/naga#2071
  5. [wgsl-in] Implement const expressions gfx-rs/naga#1829
  6. Turns out Naga is lexical scoping instead of functional/modulo scoping and they are changing that in [wgsl-in] Implement module-level scoping gfx-rs/naga#2075
@vicary vicary added the type:bug Something isn't working label Sep 20, 2022
@qjia7
Copy link
Contributor

qjia7 commented Sep 21, 2022

@haoyunfeix Please take a look, thanks.

@haoyunfeix
Copy link
Contributor

@vicary Thanks for your effort on denoland/deno#15853 and report this bug! And could you please show me the steps how to build deno from source at commit 2929ec9f if possible? I would like to use a local build of deno to reproduce this issue in your way.

@qjia7 In another way, I can reproduce this issue with local build webgpu backend by skipping the feature check like:

-    const supportTimeQuery = adapter.features.has('timestamp-query');
+    const supportTimeQuery = false;

Here is the code:

import * as tf from 'https://cdn.skypack.dev/@tensorflow/tfjs'
import '../dist/bin/tfjs-core/tfjs-core_pkg/dist/tf-core.es2017.js'
import '../dist/bin/tfjs-backend-webgpu/tfjs-backend-webgpu_pkg/dist/tf-backend-webgpu.es2017.js'

async function test(backend) {
  // initialize tensorflow
  if(await tf.setBackend(backend)){
    await tf.ready()

    const model = tf.sequential();
    model.add(tf.layers.dense({ units: 1, inputShape: [1] }));
    const output = await model.predict(tf.tensor([1])).array();
    console.log(`The output of ${backend} is ${output}`); // prints [[0]]
  }
  else{
    console.log(`${backend} is not set successfully!`);
  }
}

await test('webgpu');
await test('cpu');

command and output:

wp >>> ~/.deno/bin/deno run --allow-write --allow-read --allow-net --unstable mod3.ts                  22-09-21 13:03 
libEGL warning: pci id for fd 12: 102b:0522, driver (null)

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
The output of webgpu is 0
The output of cpu is -0.8412973284721375

deno version:

wp >>> ~/.deno/bin/deno --version                                                                      22-09-21 13:04 
deno 1.25.0 (release, x86_64-unknown-linux-gnu)
v8 10.6.194.5
typescript 4.7.4

@vicary
Copy link
Author

vicary commented Sep 21, 2022

And could you please show me the steps how to build deno from source at commit 2929ec9f if possible? I would like to use a local build of deno to reproduce this issue in your way.

I followed the steps in their docs: https://deno.land/manual@v1.25.3/contributing/building_from_source

  1. brew install rust llvm
  2. git clone --recurse-submodules https://github.com/denoland/deno.git
  3. cd deno
  4. git checkout 2929ec9f Optional because the current HEAD should behave the same.
  5. I am using M1 and I don't care about debug info & profiling, so I built with two more options:
    cargo build -vv --target=aarch64-apple-darwin --release
    If their default target doesn't work for your linux, you may look up availble build targets via:
    rustc --print target-list
  6. You may now run the resulting binary at ./target/*/release/deno

You may use -A to allow all permissions: deno run -A --unstable mod3.ts

@vicary
Copy link
Author

vicary commented Sep 29, 2022

Gently pinging @haoyunfeix, is there anything I can do to help move this forward?

@haoyunfeix
Copy link
Contributor

@vicary Sorry for the delay!

Seems there are some shader validation difference between browser and deno.
Here is a shader generated by tfjs shader.txt, it runs well on Chrome Canary 108.0.5329.0 (V8 10.8.79) but met several validation err in deno and got zeros. This shader got from code:

import '../dist/bin/tfjs-core/tfjs-core_pkg/dist/tf-core.es2017.js'
import '../dist/bin/tfjs-backend-webgpu/tfjs-backend-webgpu_pkg/dist/tf-backend-webgpu.es2017.js'

import * as tf from 'https://cdn.skypack.dev/@tensorflow/tfjs'

async function test(backend) {
  // initialize tensorflow
  if (await tf.setBackend(backend)) {
    await tf.ready()

    const a = tf.tensor2d([1, 2, -3, -4], [2, 2]);
    const b = tf.tensor1d([1, 2]);

    tf.env().set('WEBGPU_CPU_FORWARD', false);
    let c = tf.add(a, b);
    console.log(await a.data());
    console.log(await b.data());
    console.log(await c.data());
    console.log(tf.getBackend());
  } else {
    console.log(`${backend} is not set successfully!`);
  }
}

await test('webgpu');
await test('cpu');
Error Code Example
image const workGroupSizeX = 256u; const workGroupSizeY = 1u; const workGroupSizeZ = 1u;
image @compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)
image Call another function in entryPoint function

Errors happens at https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgpu/src/webgpu_program.ts#L59
After I fixed these shader compile errors, I got the correct result but I don't think it's a good idea to change current shaders.

@qjia7 WDYT? Is it an issue caused by different version of WGSL?

@qjia7
Copy link
Contributor

qjia7 commented Sep 30, 2022

@haoyunfeix @vicary I think deno is relying on an old webgpu(wgsl) point. But tfjs-webgpu is following up the latest webgpu/WGSL spec. So to resolve this issue, deno needs to upgrade the underlying webgpu implementation in your side.

@qjia7
Copy link
Contributor

qjia7 commented Sep 30, 2022

@vicary Please let us know whether this issue can be fixed after upgrading webgpu in deno.

@vicary
Copy link
Author

vicary commented Sep 30, 2022

@qjia7 Deno is using a rust implementation of WebGPU deno_webgpu, I think it is already using the latest version. Quoting from their README.md:

The spec is still very bare bones, and is still missing many details. As the spec becomes more concrete, we will implement to follow the spec more closely.

The Shader Support in the monorepo also mentioned a partial support of the draft.

Note that the WGSL specification is still under development, so the draft specification does not exactly describe what wgpu supports. See below for details.

Sorry for my lack of knowledge of the WebGPU standard, it would be very kind of you to address the earliest compatible version or specific features. I believe this would create a legit use case for them to work on those specific implementations.

@qjia7
Copy link
Contributor

qjia7 commented Sep 30, 2022

@vicary From the error message provided by @haoyunfeix , it's due to wgpu is behind of the latest WGSL spec. So maybe a better way is to report a bug to https://github.com/gfx-rs/wgpu/ to let it support TFJS webgpu's shaders instead of workaround this in TFJS-webgpu. How do you think?
@haoyunfeix Can you help file an issue to wgpu to let it support those features that we needed, like const?

@vicary
Copy link
Author

vicary commented Sep 30, 2022

Thanks for the reply @qjia7, also please let me thank @haoyunfeix ahead of time for creating the issue upstream.

From the first 2 errors I get that it's about the missing const keyword, I have updated the issue description with related tracking issues in the wgpu repo.

I am not sure I understand the last error though, maybe the rust implementation requires the main function to be declared before entrypoint? I want to have a deeper understanding on this issue, and hopefully making a PR there if it's within my capabilities. @haoyunfeix would you mind sharing your temporary fix while you are preparing the upstream issue?

@haoyunfeix
Copy link
Contributor

@qjia7 Since wgpu relies on naga for wgsl-in compilation, I submitted const issue to gfx-rs/naga#2071

@haoyunfeix
Copy link
Contributor

@haoyunfeix would you mind sharing your temporary fix while you are preparing the upstream issue?

Sure. My temporary fix as below, to move main implantation into _start entry point:
Before:

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

        fn main(index : i32)
       {
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }

After:

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          var index = getGlobalIndex();
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }

You could see that the other _start outside function like getCoordsFromIndex and getAByOutputIndex can compile and works well, so maybe we shouldn't call a function named main? I'll take a double check.

@haoyunfeix
Copy link
Contributor

Oh, seems function must declare before entry point AND before @compute @workgroup_size(256, 1, 1)
Shader as below is workable.

...
        var<workgroup> sharedBuf : array<f32, 2>;
        fn main(index : i32){
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }


  @compute @workgroup_size(256, 1, 1)
  //@compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

@gyagp
Copy link

gyagp commented Sep 30, 2022

Oh, seems function must declare before entry point AND before @compute @workgroup_size(256, 1, 1) Shader as below is workable.

...
        var<workgroup> sharedBuf : array<f32, 2>;
        fn main(index : i32){
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }


  @compute @workgroup_size(256, 1, 1)
  //@compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

Maybe this is a bug for Mozilla's Naga? I think we call _start (entry point) after the declarations of main and _start, which should be fine. At least Tint also thinks it's OK. @haoyunfeix or @vicary , could you please also file a bug to them?

@vicary
Copy link
Author

vicary commented Sep 30, 2022

Issue created. I don't have time to dig into naga yet. If it turns out lexical scoping is the root cause, it may result in a breaking change of theirs and we may not see it happens as soon as const.

Interpreters with function scoping should be compatible with lexical scoping, moving fn main upwards should increase compatibility without apparent down sides.

What do you think? @gyagp @haoyunfeix

haoyunfeix added a commit to haoyunfeix/tfjs that referenced this issue Oct 12, 2022
FIXES tensorflow#6842
To support shader translation library which does not implement module
scoping like naga
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

qjia7 pushed a commit that referenced this issue Oct 13, 2022
…6918)

* [webgpu] Update shader to support non module-level scoping function

FIXES #6842
To support shader translation library which does not implement module
scoping like naga

* Unify kerels

use main() to generate user function and getStartHenderString() to make
entry point function

* Use isFlatPatchLayout to determine main header

And address comments

* remove unnecessary scope checking
@haoyunfeix
Copy link
Contributor

Interpreters with function scoping should be compatible with lexical scoping, moving fn main upwards should increase compatibility without apparent down sides.

What do you think? @gyagp @haoyunfeix

@vicary Done with #6918, for shader compiler issues 1 and 2 mentioned above we intend not to fix in TFJS but track them on naga project(gfx-rs/naga#2071 and gfx-rs/naga#2080).

BTW, I indeed try to fix 1 and 2 in TFJS(https://github.com/tensorflow/tfjs/compare/master...haoyunfeix:tfjs:test_6842?expand=1) and glad to see WebGPU on deno could get the same result as CPU backend. I posted all resources(updated webgpu build and test code) here in case you are interested in.

upload.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants