Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webgl: Add shapes uniforms to reduce shader compilation time #5240

Merged
merged 6 commits into from
Jul 1, 2021

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Jun 22, 2021

PERF
Fix #5205

This PR adds the shapes uniforms support and enables it for unary/binary
ops.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@google-cla google-cla bot added the cla: yes label Jun 22, 2021
PERF
Fix tensorflow#5205

This PR adds the shapes uniforms support and enables it for unary/binary
ops.
@qjia7
Copy link
Contributor Author

qjia7 commented Jun 23, 2021

@pyu10055 @lina128 @jinjingforever Please take a look, thanks.

Currently, the shapes uniforms are supported. This PR only enables it for unary/binary ops. And one limitation is only rank<=4 can go to the uniform path.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, Thank you! One high level question, the limitation of rank <= 4 is caused by the uniform vector size? can matrix be used here to support up to rank 6?

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @qjia7)


tfjs-backend-webgl/src/binaryop_packed_gpu.ts, line 83 at r1 (raw file):

          }
        } else {
          const channels = getChannels('coords', rank);

is the channel variable still making the shader dynamic by the output shape?


tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

      switch (uniformShape.length) {
        case 1:
          gpgpu.gl.uniform1iv(varShapeLoc, new Int32Array(uniformShape));

should unsigned int be used here? uniform1uiv?


tfjs-backend-webgl/src/gpgpu_math.ts, line 240 at r1 (raw file):

default

tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

        gpgpu.gl.uniform4iv(outShapeLoc, new Int32Array(output.shape));
        break;
      default:

is it possible to use two uniform variables for larger than rank 4 tensors?
or use two uniform variables, one for the rank value and the other 3x2 uniform matrix for the shapes? this could give us up to rank 6.


tfjs-backend-webgl/src/gpgpu_math.ts, line 332 at r1 (raw file):

    const hasOffset = x.texData != null && x.texData.slice != null &&
        x.texData.slice.flatOffset > 0;
    if (program.enableShapeUniforms && !x.isUniform) {

can you add some comments to explain why it needs to build the input key this way.


tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

  // Fast string concat. See https://jsperf.com/string-concatenation/14.
  key +=
      '_' + keyInputs + '_' + keyUserCode + `env().getNumber('WEBGL_VERSION')`;

With this new key scheme, the shape is no longer part of the key, but there are other information added. Can explain what type of duplicated shaders it can reduce?

@pyu10055 pyu10055 requested a review from ahmedsabie June 23, 2021 05:27
Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)


tfjs-backend-webgl/src/binaryop_packed_gpu.ts, line 83 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

is the channel variable still making the shader dynamic by the output shape?

No. It only depends on the outShape.length, which has been added to the key in the makeShaderKey.


tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

should unsigned int be used here? uniform1uiv?

Currently, all shapes are used as int instead of uint in shader_compiler. Do you want to change this way?


tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

is it possible to use two uniform variables for larger than rank 4 tensors?
or use two uniform variables, one for the rank value and the other 3x2 uniform matrix for the shapes? this could give us up to rank 6.

Yes, it's a method. But I don't plan to implement it in this PR.
In this PR, I hope we can provide a basic uniform support and enable ops step by step so that we can observe the perf impact by different ops.
For 5d/6d support, we can do it in future. I think it won't bring big changes for current framework.


tfjs-backend-webgl/src/gpgpu_math.ts, line 332 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add some comments to explain why it needs to build the input key this way.

Done. But I don't have a good solution to keep the consistent between here and shader_compiler. All these conditions are required because we go to a special(optimized) path in shader_compilar, which will results the shader diversity.


tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

With this new key scheme, the shape is no longer part of the key, but there are other information added. Can explain what type of duplicated shaders it can reduce?

The additional info in shader key are mainly for some special cases. Besides that, all kinds of inputs can share the same shader code. For example, for 4d tensors a + b, we can use the same shader binary, no matter the shape value. However, previously, it will generate one shader binary for one input combination if the inputs value are not exactly same.
For example,
Relu6 4D[1,17,17,192]
Relu6 4D[1,33,33,48]
Relu6 4D[1,9,9,384]
They all use the same shader code and just compile once. However, previously, it will compile three times.

@qjia7
Copy link
Contributor Author

qjia7 commented Jun 24, 2021

One high level question, the limitation of rank <= 4 is caused by the uniform vector size? can matrix be used here to support up to rank 6?

Yes, that's doable. See my reply in code review. I prefer we do it later not in this PR since I don't think it will bring big code change in current framework.

And we don't have any android device to test if it will bring any prediction time perf regression. Can you help check that?
I can test the perf data on intel windows CFL and send you later. Thanks.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will do a quick benchmark test on android devices.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)


tfjs-backend-webgl/src/gpgpu_math.ts, line 229 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

Currently, all shapes are used as int instead of uint in shader_compiler. Do you want to change this way?

it is fine to use int for the consistency here.


tfjs-backend-webgl/src/gpgpu_math.ts, line 291 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

Yes, it's a method. But I don't plan to implement it in this PR.
In this PR, I hope we can provide a basic uniform support and enable ops step by step so that we can observe the perf impact by different ops.
For 5d/6d support, we can do it in future. I think it won't bring big changes for current framework.

sg.


tfjs-backend-webgl/src/gpgpu_math.ts, line 368 at r1 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

The additional info in shader key are mainly for some special cases. Besides that, all kinds of inputs can share the same shader code. For example, for 4d tensors a + b, we can use the same shader binary, no matter the shape value. However, previously, it will generate one shader binary for one input combination if the inputs value are not exactly same.
For example,
Relu6 4D[1,17,17,192]
Relu6 4D[1,33,33,48]
Relu6 4D[1,9,9,384]
They all use the same shader code and just compile once. However, previously, it will compile three times.

thanks for explanation.


tfjs-backend-webgl/src/gpgpu_math.ts, line 352 at r2 (raw file):

getBroadcastDims

the shader_compiler seems to computing broadcastDims based on the texture logic shape, is it the same as the shape here?


tfjs-backend-webgl/src/gpgpu_math.ts, line 356 at r2 (raw file):

          x.shape.length === output.shape.length &&
          util.arraysEqual(xTexShape, output.texData.texShape);
      // |x.shape.length| is used to determin the coords length. See

there is a typo determine, can you provide a summary saying:
These keys are needed due to shader_compiler is embedding them in the shader.

also can you style the comment a bit, making each key component a separate line.


tfjs-backend-webgl/src/gpgpu_math.ts, line 367 at r2 (raw file):

      // getOutputPacked1DCoords. |rank2| is used in getOutput2DCoords. |rank34|
      // is used in getSampler3D/getSampler4D. |xTexShape[0] > 1, xTexShape[1] >
      // 1| is used in getSampler[Scalar|1D|2D].

another question is can all of these parameters become uniform as well? If so, has webGPU done that, what is the estimated amount of effort?

Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)


tfjs-backend-webgl/src/gpgpu_math.ts, line 352 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…
getBroadcastDims

the shader_compiler seems to computing broadcastDims based on the texture logic shape, is it the same as the shape here?

Yes, it's the shape. You can find the logic shape definition in L75 in this file.

  const inputInfos: InputInfo[] = inputs.map((input, i) => {
    const shapeInfo: ShapeInfo = {
      logicalShape: input.shape,
      texShape: input.isUniform ? null : input.texData.texShape,
      isUniform: input.isUniform,
      isPacked: input.isUniform ? false : input.texData.isPacked,
      flatOffset: null
    };

You can see that the logicalShape is the input.shape.


tfjs-backend-webgl/src/gpgpu_math.ts, line 356 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

there is a typo determine, can you provide a summary saying:
These keys are needed due to shader_compiler is embedding them in the shader.

also can you style the comment a bit, making each key component a separate line.

Done.


tfjs-backend-webgl/src/gpgpu_math.ts, line 367 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

another question is can all of these parameters become uniform as well? If so, has webGPU done that, what is the estimated amount of effort?

Maybe, but it may hurt the performance if you bring too many if else. I can give a look if we need to further decouple the dependence on these conditions.
For webgpu it still depends on the in/outshape rank and the broadcastDims. There is no so many limitations for webgpu since it uses buffer instead of texture. Many limitations here are due to texture shape. Another reason may be that webgpu hasn't done many detailed optimizations for some special logical shape.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jiajia, I think we should have the flag default to false, and add a integration test (with the flag on) in the tfjs-backed-webgl/scripts/test-ci.sh file. I will LGTM after these changes.

Reviewable status: 0 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @ahmedsabie, @jinjingforever, @lina128, @pyu10055, and @qjia7)

@qjia7 qjia7 force-pushed the webgl_uniforms branch from 40730e7 to fa26598 Compare July 1, 2021 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[perf] improve shader compilation for WebGL with KHR_parallel_shader_compile extension
2 participants