-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] improve shader compilation for WebGL with KHR_parallel_shader_compile extension #5205
Comments
cc @qjia7 |
@pyu10055 Will be someone be assigned for this issue or do you need any help from us? |
@qjia7 If you have bandwidth, we would love to have you help with the initial investigation. As of today, our shader compilations are performed at per-op execution time. It would be interesting to see how would the extension fit into this scenarios. |
There are several things we can have a try:
We'd love to try 1 and 2. But 3 needs your help since it will change the upper framework. How do you think? |
just my $0.02... first - i LOVE this proposal! This is probably the biggest issue with WebGL nowadays as slow app startup turns users away. (1) doesn't do much due to how tfjs shader compilation is structured enumerating ops in const model: GraphModel = tf.loadGraphModel('test/model.json');
const ops: Record<string, Array<string>> = {};
for (const op of Object.values(model.executor.graph.nodes) as Array<{category: string, op: string}>) {
if (!ops[op.category]) ops[op.category] = [];
if (!ops[op.category].includes(op.op)) ops[op.category].push(op.op);
}
console.log('ops used by model:', ops); output: ops used by model: {
graph: [ 'Const', 'Placeholder', 'Identity' ],
convolution: [ '_FusedConv2D', 'FusedDepthwiseConv2dNative', 'DepthwiseConv2dNative', 'Conv2D', 'MaxPool' ],
arithmetic: [ 'Mul', 'Add', 'FloorDiv', 'FloorMod', 'Sub' ],
basic_math: [ 'Relu6', 'Relu', 'Sigmoid' ],
reduction: [ 'Mean' ],
image: [ 'ResizeBilinear' ],
slice_join: [ 'ConcatV2', 'GatherV2', 'StridedSlice' ],
transformation: [ 'Reshape', 'Cast', 'ExpandDims' ],
logical: [ 'Equal' ],
evaluation: [ 'TopKV2' ]
} |
@qjia7 I agree with @vladmandic that option 2 and 3 looks like crucial to gain performance gain on the parallel compilation. Similar to the warm up run, the graph model can have a compilation step, and the engine should have a compile API in comparison to current execution API, to avoid any texture upload. |
(Non-technical comment: I write a browser plugin that basically blocks browser functionality for 10 seconds during model loading, so I'm quite happy to hear about performance improvement ideas here and plan to watch the progress eagerly! Thanks!) |
Thanks for your inputs. I will take a look at the step 2 |
PERF Fix tensorflow#5205 This PR adds the shapes uniforms support and enables it for unary/binary ops.
FEATURE * webgl: Add shapes uniforms to reduce shader compilation time PERF Fix #5205 This PR adds the shapes uniforms support and enables it for unary/binary ops. * fix the bot failure * Add annotation for the key composition. * address comments * Disable shapes uniforms by default and enable it in integration test
@qjia7 Thanks for your hard work! I was so excited to give this a try as I saw TF.js 3.8.0 was released! My plugin is still back on 2.7.0, so I did a quick upgrade. What kind of performance numbers were others here seeing from this PR? (also @vladmandic @pyu10055 ) |
@wingman-jr-addon This issue has not been finished. It may be closed by accident. Currently, using shapes uniforms is disabled by default. You need to set |
Thank you for the detailed explanation @qjia7 - if it's hidden behind a flag, I'm guessing that this regression has nothing to do with your recent work. Based on that, let me do some bisecting on versions and see if I can narrow the cause down a bit further and then provide a minimal reproduction either here or in an appropriate issue. |
@qjia7 Through bisection I've narrowed it down to a change that occurred between 3.3.0 and 3.4.0. I'll do some more looking but that means it is definitely not related to this functionality. |
I've tested this on my notebook with 3 different models of medium-high complexity
All-in-all:
As it is, I'll be setting Note: Chrome does extensive shader caching between sessions, so simple page reload is not sufficient and full browser restart is needed between tests |
Thank you @vladmandic for your much more thorough analysis. I'm sure that took quite some time. I'll be watching over on the issue where you cross-posted as we look at this issue specifically. |
Bug #5205 Co-authored-by: Na Li <linazhao@google.com>
see #5689 for fully reproducible code and additional performance notes. |
Related PR has been merged , closing this issue. Thank you |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
Describe the current behavior
The initial inference on current TFJS webGL backend is much slower, which is caused by shader compilation and texture allocation.
Describe the expected behavior
With the latest extension KHR_parallel_shader_compile, there is a chance to speed up the shader compilation and reduce the initial inference time.
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.
Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: