Some bugs in WebGL and WebGPU #816

Aixile · 2018-04-23T06:22:10Z

Codes and the model for reproducing can be found here, I am using webdnn with commit f403a30da36b6741bc857c21c3ca1e65af8fbac9

For model conversion, please use
python convert_webdnn.py --chainer_model_path SmoothedGenerator_40000.npz --out models/resnet256

Also, there is a web interface in webcode/webdnn.

When I try to convert to WebGL with 8bit compression, I got

Generator model loaded
Start Convert
Traceback (most recent call last):

  File "convert_webdnn.py", line 44, in <module>
    exec_info = generate_descriptor("webgl", graph)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/interface/generator.py", line 107, in generate_descriptor
    return generator(graph, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/webgl/generator.py", line 92, in generate
    return WebGLDescriptorGenerator.generate(graph, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/webgl/generator.py", line 59, in generate
    constants_bytes = constant_encoder.encode(memory_layout)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/encoder/constant_encoder_eightbit.py", line 66, in encode
    all_code += self._single_encode(single_data, alloc)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/encoder/constant_encoder_eightbit.py", line 72, in _single_encode
    maxval = np.max(np.abs(single_data))
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2272, in amax
    out=out, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

WebGL without 8bit compression can be sucessfully converted,
However, it gives wrong answer.

Expected:

Got:

3. WebGPU model can be converted, however, it cannot be loaded by the browser.

Model loading failed for webgpu backend. Trying next backend: Range consisting of offset and length are out of bounds

Safari 11.0.3

This repo also contains a speed comparsion with tensorflow.js, webdnn with webgl is 1.5~2x faster than tfjs on my computer, except it gives a wrong anwser.

The text was updated successfully, but these errors were encountered:

milhidaka · 2018-04-27T04:18:09Z

Sorry for late reply. I will investigate it.

milhidaka · 2018-04-27T09:50:42Z

This bug also occur in train_mnist_chainer.py with constant_encoder_name="eightbit". (non-constant) variable offset in webgl is "-1", and it causes error in the encoder. Continuing to debug.

webdnn/src/graph_transpiler/webdnn/encoder/constant_encoder_eightbit.py

Line 66 in e6ab747

single_data = memory_layout.data[alloc.offset:alloc.offset + alloc.size]

milhidaka · 2018-04-30T09:37:29Z

It broken in commit 56113b2.
From this commit, train_mnist_chainer.py with constant_encoder_name="eightbit" on generate_descriptor raises error in webgl backend.

milhidaka · 2018-04-30T12:34:08Z

There seems to be three different bugs!
I solved one, and found workaround for another one.

Problems:

Weight packing on WebGL backend (solved)
Graph conversion error on WebGL (workaround)
Error on WebGPU (not yet)

Weight packing problem occurred in constant_encoder_name="eightbit"
On WebGL, size of texture and original variable differs because texture have to be rectangle. Texture size is calculated by height * width, and they must be integer. Therefore, rounding up is applied for texture size, which makes texture size > original size. However, it is not considered in constant_encoder_eightbit.py.
Also, classification of constant and variable was wrong.

I put temporary fix to fix-816 branch (a686df1), so please try it to avoid this problem.

Graph conversion error on WebGL
There is some bug in WebGL backend to transforming computation graph for texture size 4096 and 8192. Their weight size (weight_webgl_4096.bin) is unnaturally small.

$ ls -l models/resnet256
total 1206976
-rw-r--r--  1 hidaka  staff      37301  4 30 21:14 graph_webassembly.json
-rw-r--r--  1 hidaka  staff    3476587  4 30 21:14 graph_webgl_16384.json
-rw-r--r--  1 hidaka  staff    6124614  4 30 21:14 graph_webgl_4096.json
-rw-r--r--  1 hidaka  staff    4214513  4 30 21:14 graph_webgl_8192.json
-rw-r--r--  1 hidaka  staff     296498  4 30 21:02 graph_webgpu.json
-rw-r--r--  1 hidaka  wheel     106503  4 30 21:14 kernels_asmjs.js
-rw-r--r--  1 hidaka  staff       9748  4 30 21:14 kernels_asmjs.js.mem
-rw-r--r--  1 hidaka  staff      51407  4 30 21:14 kernels_webassembly.cpp
-rw-r--r--  1 hidaka  wheel      24125  4 30 21:14 kernels_webassembly.js
-rw-r--r--  1 hidaka  staff      56040  4 30 21:14 kernels_webassembly.wasm
-rw-r--r--  1 hidaka  staff      65574  4 30 21:02 kernels_webgpu.metal
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:14 weight_webassembly.bin
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:14 weight_webgl_16384.bin
-rw-r--r--  1 hidaka  staff   14792716  4 30 21:14 weight_webgl_4096.bin
-rw-r--r--  1 hidaka  staff   33667084  4 30 21:14 weight_webgl_8192.bin
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:02 weight_webgpu.bin

I found that graph descriptor for size 16384 works correctly.
Currently, all devices loads size 4096, so the workaround is

cp weight_webgl_16384.bin weight_webgl_4096.bin
cp graph_webgl_16384.json graph_webgl_4096.json

Of course, it does not work devices which does not support texture size 16384.

By these two workarounds, I managed to WebGL + 8bit compression model to work on Chrome.

Kiikurage · 2018-04-30T17:58:33Z

I started to track these two problems in #820 and #821.

Kiikurage · 2018-05-06T07:58:12Z

@milhidaka I re-implement your patch in e06f903, with some extra comments. Please review it.

#816: Fix weight packing procedure in WebGL backend.

milhidaka self-assigned this Apr 27, 2018

milhidaka added a commit that referenced this issue Apr 30, 2018

fix weight packing problem in model of #816

a686df1

This was referenced Apr 30, 2018

Lose some weight parameters in WebGL 4096, 8192 mode. #820

Closed

Failed to load large weight data in WebGPU backend #821

Open

Kiikurage added a commit that referenced this issue May 6, 2018

#816: Fix weight packing procedure in WebGL backend.

e06f903

Kiikurage added a commit that referenced this issue May 6, 2018

Merge pull request #825 from mil-tokyo/dev-816

785085e

#816: Fix weight packing procedure in WebGL backend.

Kiikurage mentioned this issue May 6, 2018

Bump package version to 1.2.4 #827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some bugs in WebGL and WebGPU #816

Some bugs in WebGL and WebGPU #816

Aixile commented Apr 23, 2018

milhidaka commented Apr 27, 2018

milhidaka commented Apr 27, 2018

milhidaka commented Apr 30, 2018

milhidaka commented Apr 30, 2018

Kiikurage commented Apr 30, 2018

Kiikurage commented May 6, 2018 •

edited

Loading

Some bugs in WebGL and WebGPU #816

Some bugs in WebGL and WebGPU #816

Comments

Aixile commented Apr 23, 2018

milhidaka commented Apr 27, 2018

milhidaka commented Apr 27, 2018

milhidaka commented Apr 30, 2018

milhidaka commented Apr 30, 2018

Kiikurage commented Apr 30, 2018

Kiikurage commented May 6, 2018 • edited Loading

Kiikurage commented May 6, 2018 •

edited

Loading