webgpu: support dataToGPU api #6329

xhcao · 2022-04-15T08:32:31Z

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

xhcao · 2022-04-15T08:35:39Z

@qjia7 @axinging @haoyunfeix @gyagp Please take a look, thank you.

qjia7 · 2022-04-15T08:58:41Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+      util.assert(
+          options.customBufSize % 4 === 0,
+          () => 'customBufSize should be a multiple of 4.');
+    }


We should make sure that options.customBufSize is always larger than or equal to bufferInfo.byteSize. Otherwise, the user will lose partial data.
And see Na's reply about the reason why customShape is needed in webgl
#5953 (review)

Also paste it here.

Just curious, why support customTexShape? Is there any special requirement for this?

The main use case is if user set the texShape to be the same as the canvas shape, then they can directly render the result on canvas. If they don't specify the texShape, we try to make a squarish texture.

For webgpu, I think we keep it is to conveniently copy the returned buffer to a texture and render to the canvas. And if this point doesn't hold, we should remove options.customBufSize for webgpu.

qjia7 · 2022-04-15T09:08:06Z

tfjs-backend-webgpu/src/backend_webgpu.ts

@@ -471,6 +472,83 @@ export class WebGPUBackend extends KernelBackend {
    return vals;
  }

+  async downloadGPUBufferData(gpuData: GPUData):


The function body is very similar with getBufferData. Can you write a common helper function to reuse the code?

qjia7 · 2022-04-15T09:21:53Z

tfjs-backend-webgpu/src/backend_webgpu_test.ts

+    const res = b.dataToGPU();
+    expectArraysEqual(res.bufSize, size);
+    const resData = await webGPUBackend.downloadGPUBufferData(res);
+    expectArraysClose(resData, new Float32Array(data));


Why not directly use expectArraysClose(resData, data);?

qjia7 · 2022-04-15T09:24:42Z

tfjs-backend-webgpu/src/backend_webgpu_test.ts

+    expectArraysClose(resData, new Float32Array(data));
+  });
+
+  it('uses user defined bufSize.', async () => {


For a shorter buffer, we should make it throw an error instead of making it valid.

I had changed bufSize from smaller to larger. This case is used to test that custom buffer size is also valid for dataToGPU. And throwing an error is covered by last case when buffer size is smaller.

axinging · 2022-04-18T01:10:19Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    }
+
+    if (bufferInfo.buffer == null) {
+      if (values != null) {


It seems all test cases only test dataToGPU happens at CPU forward disabled, it means tensor are always in GPU. Should we handle tensor in CPU, such as add some CPU forward enabled case?

Hi, Xing, Sorry that I did not fully understand your meaning. But dataToGPU is used when the data of tensor must be in GPU.

gyagp · 2022-04-18T16:08:27Z

tfjs-core/package.json

@@ -78,7 +78,8 @@
    "@types/webgl-ext": "0.0.30",
    "long": "4.0.0",
    "node-fetch": "~2.6.1",
-    "seedrandom": "2.4.3"
+    "seedrandom": "2.4.3",
+    "@webgpu/types": "0.1.6"


We need to use "@webgpu/types" defined in tfjs/package.json instead.

tfjs/package.json had already added this package, so I removed it here.

xhcao · 2022-04-22T08:14:57Z

FYI. There are issues about changing the shape when calling dataToGPU, I are discussing with LiNa in #5953.

xhcao · 2022-04-26T09:38:14Z

Please review it again, I sort of agree to add the custom size option, some details is #5953 (comment)

qjia7

Add @lina128 to take another look.

LGTM with one open.
For webgpu, we are exposing the buffer instead of a texture. So I am not sure whether it's still necessary to let user specify a custom buffer size.

qjia7 · 2022-04-26T12:14:59Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    }
+
+    const bufferSize = options.customBufSize != null ?
+        Math.max(bufferInfo.byteSize, options.customBufSize) :


nit: Math.max(bufferInfo.byteSize, options.customBufSize) -> options.customBufSize ?

qjia7 · 2022-04-26T12:18:20Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    const bufferSize = options.customBufSize != null ?
+        Math.max(bufferInfo.byteSize, options.customBufSize) :
+        bufferInfo.byteSize;
+    const copySize = options.customBufSize != null ?


const copySize = bufferInfo.byteSize;? Or directly use bufferInfo.byteSize in copyBufferToBuffer.

qjia7

Please also fix the bots failures. Thanks.

qjia7 · 2022-04-27T03:36:50Z

tfjs-backend-webgpu/src/backend_webgpu.ts

@@ -17,7 +17,7 @@

 import './flags_webgpu';

-import {backend_util, buffer, DataStorage, DataType, DataValues, engine, env, KernelBackend, Rank, RecursiveArray, ShapeMap, TensorBuffer, TensorInfo, TimingInfo, TypedArray, util} from '@tensorflow/tfjs-core';
+import {backend_util, buffer, DataStorage, DataToGPUWebGLOption, DataType, DataValues, engine, env, GPUData, KernelBackend, Rank, RecursiveArray, ShapeMap, TensorBuffer, TensorInfo, TimingInfo, TypedArray, util} from '@tensorflow/tfjs-core';


Should we use DataToGPUOptions instead of DataToGPUWebGLOption for webgpu backend?

qjia7 · 2022-04-27T03:38:59Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    this.submitQueue();
+
+    const tensorInfo = this.makeTensorInfo(
+        [bufferSize / webgpu_util.GPUBytesPerElement(dtype)], dtype);


To keep consistent with WebGL, should we use the src tensor's shape here? And maybe add a test to verify the tensorRef's shape and dtype?

I still think there is a potential issue if we use the buffer size to calculate the tensor shape. Currently, it may be safe since the the buffer size is exactly equal to the shape's size. However, what if we use a larger buffer to store the data, for example, clamp all buffer to be aligned with 16 bytes for optimization in future.
Can we directly use the source tensor's shape here? I know currently we don't keep the shape in TensorBufferInfo. Maybe we should add it?
Another example is webgl. WebGL is using RGBA 4 channels to store data. However, the actual result tensor shape's size may not be divisible by 16 bytes. So the original tensor shape is useful.
@xhcao @lina128 How's your opinion on this?

There exists an issue. We cannot compute the element count from the source buffer size, because the buffer size may be larger than the elements size, for example, if dtype is 2 bytes and there are 5 elements, because the buffer size must be multiple of 4 when creating the buffer, the real buffer size is 12, if we use buffer size to compute the element count, the result is 6 and is wrong. So we should add the shape for webgpu backend.

xhcao · 2022-04-28T03:40:06Z

Please review it again, thank you.

lina128

Great work, thank you!

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @axinging, @qjia7, @webgpu, and @xhcao)

tfjs-backend-webgpu/src/backend_webgpu.ts line 517 at r1 (raw file):

Previously, xhcao wrote…

Hi, Xing, Sorry that I did not fully understand your meaning. But dataToGPU is used when the data of tensor must be in GPU.

Yeah, Jiajia asked me this question too for WebGL. This is the same behavior as WebGL implementation today. My thinking is that user should catch the error and read data from CPU, whether they want to upload or manipulate data directly in CPU. If data is on CPU, most likely it's not a lot of data, so the use case may actually be computation on CPU. But we don't know what the real use case is, so let's keep the design simple for now. If there's actual user request in the future, we can add the support. What do you think?

xhcao · 2022-05-06T02:55:41Z

Reminder: Please review it again, thank you.

qjia7

LGTM with two nits. Thanks.

tfjs-backend-webgpu/src/backend_webgpu.ts

qjia7 · 2022-05-06T03:19:10Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    info.bufferInfo.buffer = resBuffer;
+    // Explicitly change the buffer size that could release the buffer
+    // successfully in future.
+    info.bufferInfo.byteSize = bufferSize;


Why need to explicitly change the buffer size since the buffer is acquired as the same size?

Not the same, one is computed by shape, and the other is real buffer size

xhcao · 2022-05-06T09:22:02Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+        util.sizeFromShape(shape) * webgpu_util.GPUBytesPerElement(dtype);
+    if (options.customBufSize != null) {
+      util.assert(
+          options.customBufSize >= size,


Here changed from buffersize to shape size for allowing to copy multiple times of original tensor content.

qjia7

LGTM, thanks.

xhcao force-pushed the dataToGPU branch from 13b98c1 to 09c371d Compare April 15, 2022 08:33

qjia7 reviewed Apr 15, 2022

View reviewed changes

axinging reviewed Apr 18, 2022

View reviewed changes

gyagp reviewed Apr 18, 2022

View reviewed changes

xhcao force-pushed the dataToGPU branch from 09c371d to bf47241 Compare April 24, 2022 07:07

qjia7 reviewed Apr 26, 2022

View reviewed changes

qjia7 requested a review from lina128 April 26, 2022 12:33

xhcao force-pushed the dataToGPU branch from bf47241 to 507f63a Compare April 27, 2022 02:09

qjia7 reviewed Apr 27, 2022

View reviewed changes

xhcao force-pushed the dataToGPU branch from c4d7cb7 to 91d862d Compare April 28, 2022 02:48

lina128 approved these changes Apr 28, 2022

View reviewed changes

xhcao force-pushed the dataToGPU branch from 6973016 to 0056c43 Compare April 29, 2022 03:38

qjia7 approved these changes May 6, 2022

View reviewed changes

xhcao added 6 commits May 6, 2022 13:58

webgpu: support dataToGPU api

26c2f50

Address comments

f976adc

Do not allow smaller size

d72795c

Keep the source tensor shape

ac2fbf5

Fix bots' error

b37c6d9

Fix yarn lint error

cd2c998

xhcao force-pushed the dataToGPU branch from 0056c43 to b816190 Compare May 6, 2022 09:14

xhcao commented May 6, 2022

View reviewed changes

Add a shape member

d24b0cc

xhcao force-pushed the dataToGPU branch from b816190 to d24b0cc Compare May 6, 2022 11:50

qjia7 approved these changes May 6, 2022

View reviewed changes

Merge branch 'master' into dataToGPU

57f68a9

qjia7 merged commit 81c766e into tensorflow:master May 7, 2022

This was referenced May 9, 2022

Presubmit change detection is wrong for non-Bazel packages #6384

Closed

Reference types missing from tensor.d.ts #6385

Closed

webgpu: support dataToGPU api #6329

webgpu: support dataToGPU api #6329

Uh oh!

Conversation

xhcao commented Apr 15, 2022 • edited by nsthorat Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xhcao commented Apr 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xhcao commented Apr 22, 2022

Uh oh!

xhcao commented Apr 26, 2022

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xhcao commented Apr 28, 2022

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

xhcao commented May 6, 2022

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

xhcao commented Apr 15, 2022 •

edited by nsthorat

Loading