Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some bugs in WebGL and WebGPU #816

Open
Aixile opened this issue Apr 23, 2018 · 6 comments
Open

Some bugs in WebGL and WebGPU #816

Aixile opened this issue Apr 23, 2018 · 6 comments
Assignees

Comments

@Aixile
Copy link
Contributor

Aixile commented Apr 23, 2018

Codes and the model for reproducing can be found here, I am using webdnn with commit f403a30da36b6741bc857c21c3ca1e65af8fbac9

For model conversion, please use
python convert_webdnn.py --chainer_model_path SmoothedGenerator_40000.npz --out models/resnet256

Also, there is a web interface in webcode/webdnn.

  1. When I try to convert to WebGL with 8bit compression, I got
Generator model loaded
Start Convert
Traceback (most recent call last):

  File "convert_webdnn.py", line 44, in <module>
    exec_info = generate_descriptor("webgl", graph)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/interface/generator.py", line 107, in generate_descriptor
    return generator(graph, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/webgl/generator.py", line 92, in generate
    return WebGLDescriptorGenerator.generate(graph, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/backend/webgl/generator.py", line 59, in generate
    constants_bytes = constant_encoder.encode(memory_layout)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/encoder/constant_encoder_eightbit.py", line 66, in encode
    all_code += self._single_encode(single_data, alloc)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/webdnn-1.2.3-py3.6.egg/webdnn/encoder/constant_encoder_eightbit.py", line 72, in _single_encode
    maxval = np.max(np.abs(single_data))
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2272, in amax
    out=out, **kwargs)
  File "/Users/aixile/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity
  1. WebGL without 8bit compression can be sucessfully converted,
    However, it gives wrong answer.

Expected:

Got:

3. WebGPU model can be converted, however, it cannot be loaded by the browser.

Model loading failed for webgpu backend. Trying next backend: Range consisting of offset and length are out of bounds

Safari 11.0.3

This repo also contains a speed comparsion with tensorflow.js, webdnn with webgl is 1.5~2x faster than tfjs on my computer, except it gives a wrong anwser.

@milhidaka
Copy link
Member

Sorry for late reply. I will investigate it.

@milhidaka milhidaka self-assigned this Apr 27, 2018
@milhidaka
Copy link
Member

This bug also occur in train_mnist_chainer.py with constant_encoder_name="eightbit". (non-constant) variable offset in webgl is "-1", and it causes error in the encoder. Continuing to debug.

single_data = memory_layout.data[alloc.offset:alloc.offset + alloc.size]

@milhidaka
Copy link
Member

It broken in commit 56113b2.
From this commit, train_mnist_chainer.py with constant_encoder_name="eightbit" on generate_descriptor raises error in webgl backend.

@milhidaka
Copy link
Member

There seems to be three different bugs!
I solved one, and found workaround for another one.

Problems:

  • Weight packing on WebGL backend (solved)
  • Graph conversion error on WebGL (workaround)
  • Error on WebGPU (not yet)
  1. Weight packing problem occurred in constant_encoder_name="eightbit"
    On WebGL, size of texture and original variable differs because texture have to be rectangle. Texture size is calculated by height * width, and they must be integer. Therefore, rounding up is applied for texture size, which makes texture size > original size. However, it is not considered in constant_encoder_eightbit.py.
    Also, classification of constant and variable was wrong.

I put temporary fix to fix-816 branch (a686df1), so please try it to avoid this problem.

  1. Graph conversion error on WebGL
    There is some bug in WebGL backend to transforming computation graph for texture size 4096 and 8192. Their weight size (weight_webgl_4096.bin) is unnaturally small.
$ ls -l models/resnet256
total 1206976
-rw-r--r--  1 hidaka  staff      37301  4 30 21:14 graph_webassembly.json
-rw-r--r--  1 hidaka  staff    3476587  4 30 21:14 graph_webgl_16384.json
-rw-r--r--  1 hidaka  staff    6124614  4 30 21:14 graph_webgl_4096.json
-rw-r--r--  1 hidaka  staff    4214513  4 30 21:14 graph_webgl_8192.json
-rw-r--r--  1 hidaka  staff     296498  4 30 21:02 graph_webgpu.json
-rw-r--r--  1 hidaka  wheel     106503  4 30 21:14 kernels_asmjs.js
-rw-r--r--  1 hidaka  staff       9748  4 30 21:14 kernels_asmjs.js.mem
-rw-r--r--  1 hidaka  staff      51407  4 30 21:14 kernels_webassembly.cpp
-rw-r--r--  1 hidaka  wheel      24125  4 30 21:14 kernels_webassembly.js
-rw-r--r--  1 hidaka  staff      56040  4 30 21:14 kernels_webassembly.wasm
-rw-r--r--  1 hidaka  staff      65574  4 30 21:02 kernels_webgpu.metal
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:14 weight_webassembly.bin
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:14 weight_webgl_16384.bin
-rw-r--r--  1 hidaka  staff   14792716  4 30 21:14 weight_webgl_4096.bin
-rw-r--r--  1 hidaka  staff   33667084  4 30 21:14 weight_webgl_8192.bin
-rw-r--r--  1 hidaka  staff  184662028  4 30 21:02 weight_webgpu.bin

I found that graph descriptor for size 16384 works correctly.
Currently, all devices loads size 4096, so the workaround is

cp weight_webgl_16384.bin weight_webgl_4096.bin
cp graph_webgl_16384.json graph_webgl_4096.json 

Of course, it does not work devices which does not support texture size 16384.

By these two workarounds, I managed to WebGL + 8bit compression model to work on Chrome.

@Kiikurage
Copy link
Member

I started to track these two problems in #820 and #821.

@Kiikurage
Copy link
Member

Kiikurage commented May 6, 2018

@milhidaka I re-implement your patch in e06f903, with some extra comments. Please review it.

Kiikurage added a commit that referenced this issue May 6, 2018
#816: Fix weight packing procedure in WebGL backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants