webgl: Add shapes uniforms to reduce shader compilation time #5240

qjia7 · 2021-06-22T05:20:34Z

PERF
Fix #5205

This PR adds the shapes uniforms support and enables it for unary/binary
ops.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

PERF Fix tensorflow#5205 This PR adds the shapes uniforms support and enables it for unary/binary ops.

qjia7 · 2021-06-23T02:54:15Z

@pyu10055 @lina128 @jinjingforever Please take a look, thanks.

Currently, the shapes uniforms are supported. This PR only enables it for unary/binary ops. And one limitation is only rank<=4 can go to the uniform path.

pyu10055

This is awesome, Thank you! One high level question, the limitation of rank <= 4 is caused by the uniform vector size? can matrix be used here to support up to rank 6?

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @qjia7)

tfjs-backend-webgl/src/binaryop_packed_gpu.ts, line 83 at r1 (raw file):

          }
        } else {
          const channels = getChannels('coords', rank);

is the channel variable still making the shader dynamic by the output shape?

tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

      switch (uniformShape.length) {
        case 1:
          gpgpu.gl.uniform1iv(varShapeLoc, new Int32Array(uniformShape));

should unsigned int be used here? uniform1uiv?

tfjs-backend-webgl/src/gpgpu_math.ts, line 240 at r1 (raw file):

default

tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

        gpgpu.gl.uniform4iv(outShapeLoc, new Int32Array(output.shape));
        break;
      default:

is it possible to use two uniform variables for larger than rank 4 tensors?
or use two uniform variables, one for the rank value and the other 3x2 uniform matrix for the shapes? this could give us up to rank 6.

tfjs-backend-webgl/src/gpgpu_math.ts, line 332 at r1 (raw file):

    const hasOffset = x.texData != null && x.texData.slice != null &&
        x.texData.slice.flatOffset > 0;
    if (program.enableShapeUniforms && !x.isUniform) {

can you add some comments to explain why it needs to build the input key this way.

tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

  // Fast string concat. See https://jsperf.com/string-concatenation/14.
  key +=
      '_' + keyInputs + '_' + keyUserCode + `env().getNumber('WEBGL_VERSION')`;

With this new key scheme, the shape is no longer part of the key, but there are other information added. Can explain what type of duplicated shaders it can reduce?

qjia7

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

tfjs-backend-webgl/src/binaryop_packed_gpu.ts, line 83 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

is the channel variable still making the shader dynamic by the output shape?

No. It only depends on the outShape.length, which has been added to the key in the makeShaderKey.

tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

should unsigned int be used here? uniform1uiv?

Currently, all shapes are used as int instead of uint in shader_compiler. Do you want to change this way?

tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

is it possible to use two uniform variables for larger than rank 4 tensors?
or use two uniform variables, one for the rank value and the other 3x2 uniform matrix for the shapes? this could give us up to rank 6.

Yes, it's a method. But I don't plan to implement it in this PR.
In this PR, I hope we can provide a basic uniform support and enable ops step by step so that we can observe the perf impact by different ops.
For 5d/6d support, we can do it in future. I think it won't bring big changes for current framework.

tfjs-backend-webgl/src/gpgpu_math.ts, line 332 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add some comments to explain why it needs to build the input key this way.

Done. But I don't have a good solution to keep the consistent between here and shader_compiler. All these conditions are required because we go to a special(optimized) path in shader_compilar, which will results the shader diversity.

tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

With this new key scheme, the shape is no longer part of the key, but there are other information added. Can explain what type of duplicated shaders it can reduce?

The additional info in shader key are mainly for some special cases. Besides that, all kinds of inputs can share the same shader code. For example, for 4d tensors a + b, we can use the same shader binary, no matter the shape value. However, previously, it will generate one shader binary for one input combination if the inputs value are not exactly same.
For example,
Relu6 4D[1,17,17,192]
Relu6 4D[1,33,33,48]
Relu6 4D[1,9,9,384]
They all use the same shader code and just compile once. However, previously, it will compile three times.

qjia7 · 2021-06-24T06:04:26Z

One high level question, the limitation of rank <= 4 is caused by the uniform vector size? can matrix be used here to support up to rank 6?

Yes, that's doable. See my reply in code review. I prefer we do it later not in this PR since I don't think it will bring big code change in current framework.

And we don't have any android device to test if it will bring any prediction time perf regression. Can you help check that?
I can test the perf data on intel windows CFL and send you later. Thanks.

pyu10055

Thanks, I will do a quick benchmark test on android devices.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

Currently, all shapes are used as int instead of uint in shader_compiler. Do you want to change this way?

it is fine to use int for the consistency here.

tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

Yes, it's a method. But I don't plan to implement it in this PR.
In this PR, I hope we can provide a basic uniform support and enable ops step by step so that we can observe the perf impact by different ops.
For 5d/6d support, we can do it in future. I think it won't bring big changes for current framework.

sg.

tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

The additional info in shader key are mainly for some special cases. Besides that, all kinds of inputs can share the same shader code. For example, for 4d tensors a + b, we can use the same shader binary, no matter the shape value. However, previously, it will generate one shader binary for one input combination if the inputs value are not exactly same.
For example,
Relu6 4D[1,17,17,192]
Relu6 4D[1,33,33,48]
Relu6 4D[1,9,9,384]
They all use the same shader code and just compile once. However, previously, it will compile three times.

thanks for explanation.

tfjs-backend-webgl/src/gpgpu_math.ts, line 352 at r2 (raw file):

getBroadcastDims

the shader_compiler seems to computing broadcastDims based on the texture logic shape, is it the same as the shape here?

tfjs-backend-webgl/src/gpgpu_math.ts, line 356 at r2 (raw file):

          x.shape.length === output.shape.length &&
          util.arraysEqual(xTexShape, output.texData.texShape);
      // |x.shape.length| is used to determin the coords length. See

there is a typo determine, can you provide a summary saying:
These keys are needed due to shader_compiler is embedding them in the shader.

also can you style the comment a bit, making each key component a separate line.

tfjs-backend-webgl/src/gpgpu_math.ts, line 367 at r2 (raw file):

      // getOutputPacked1DCoords. |rank2| is used in getOutput2DCoords. |rank34|
      // is used in getSampler3D/getSampler4D. |xTexShape[0] > 1, xTexShape[1] >
      // 1| is used in getSampler[Scalar|1D|2D].

another question is can all of these parameters become uniform as well? If so, has webGPU done that, what is the estimated amount of effort?

qjia7

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

tfjs-backend-webgl/src/gpgpu_math.ts, line 352 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

getBroadcastDims
the shader_compiler seems to computing broadcastDims based on the texture logic shape, is it the same as the shape here?

Yes, it's the shape. You can find the logic shape definition in L75 in this file.

  const inputInfos: InputInfo[] = inputs.map((input, i) => {
    const shapeInfo: ShapeInfo = {
      logicalShape: input.shape,
      texShape: input.isUniform ? null : input.texData.texShape,
      isUniform: input.isUniform,
      isPacked: input.isUniform ? false : input.texData.isPacked,
      flatOffset: null
    };

You can see that the logicalShape is the input.shape.

tfjs-backend-webgl/src/gpgpu_math.ts, line 356 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

there is a typo determine, can you provide a summary saying:
These keys are needed due to shader_compiler is embedding them in the shader.

also can you style the comment a bit, making each key component a separate line.

Done.

tfjs-backend-webgl/src/gpgpu_math.ts, line 367 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

another question is can all of these parameters become uniform as well? If so, has webGPU done that, what is the estimated amount of effort?

Maybe, but it may hurt the performance if you bring too many if else. I can give a look if we need to further decouple the dependence on these conditions.
For webgpu it still depends on the in/outshape rank and the broadcastDims. There is no so many limitations for webgpu since it uses buffer instead of texture. Many limitations here are due to texture shape. Another reason may be that webgpu hasn't done many detailed optimizations for some special logical shape.

pyu10055

Thank you Jiajia, I think we should have the flag default to false, and add a integration test (with the flag on) in the tfjs-backed-webgl/scripts/test-ci.sh file. I will LGTM after these changes.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

pyu10055

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

google-cla bot added the cla: yes label Jun 22, 2021

webgl: Add shapes uniforms to reduce shader compilation time

8ac5bc2

PERF Fix tensorflow#5205 This PR adds the shapes uniforms support and enables it for unary/binary ops.

qjia7 force-pushed the webgl_uniforms branch from 87c6053 to 8ac5bc2 Compare June 22, 2021 05:40

fix the bot failure

8dab150

qjia7 requested review from pyu10055, lina128 and jinjingforever June 23, 2021 02:51

pyu10055 requested changes Jun 23, 2021

View reviewed changes

pyu10055 requested a review from ahmedsabie June 23, 2021 05:27

Add annotation for the key composition.

8eb5af4

qjia7 commented Jun 24, 2021

View reviewed changes

pyu10055 requested changes Jun 24, 2021

View reviewed changes

address comments

321e8c0

qjia7 commented Jun 25, 2021

View reviewed changes

pyu10055 reviewed Jun 29, 2021

View reviewed changes

pyu10055 approved these changes Jul 1, 2021

View reviewed changes

Disable shapes uniforms by default and enable it in integration test

fa26598

qjia7 force-pushed the webgl_uniforms branch from 40730e7 to fa26598 Compare July 1, 2021 05:25

Merge branch 'master' into webgl_uniforms

894e2c8

pyu10055 merged commit 5d7d90a into tensorflow:master Jul 1, 2021

gaikwadrahul8 mentioned this pull request Jul 30, 2023

Collect kernels that are heavily used in models and call for uniforms #5460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgl: Add shapes uniforms to reduce shader compilation time #5240

webgl: Add shapes uniforms to reduce shader compilation time #5240

qjia7 commented Jun 22, 2021 •

edited by nsthorat

Loading

qjia7 commented Jun 23, 2021

pyu10055 left a comment

qjia7 left a comment

qjia7 commented Jun 24, 2021

pyu10055 left a comment

qjia7 left a comment

pyu10055 left a comment

pyu10055 left a comment

webgl: Add shapes uniforms to reduce shader compilation time #5240

webgl: Add shapes uniforms to reduce shader compilation time #5240

Conversation

qjia7 commented Jun 22, 2021 • edited by nsthorat Loading

qjia7 commented Jun 23, 2021

pyu10055 left a comment

Choose a reason for hiding this comment

qjia7 left a comment

Choose a reason for hiding this comment

qjia7 commented Jun 24, 2021

pyu10055 left a comment

Choose a reason for hiding this comment

qjia7 left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

qjia7 commented Jun 22, 2021 •

edited by nsthorat

Loading