-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WEBGL_PACK_DEPTHWISECONV=true seems to cause significant first inference performance drop #5343
Comments
Hi Ahmed, I think you probably have more knowledge about the webgl backend so assigning this to you:) Please help take a look when you have a chance. Really appreciate it! |
(Also, I see that I neglected to link off to the PR that led me to this issue - #4909 - I don't know that the work done in the bulk of the PR is by any means the cause, but the change of the flag's default option is what caused the regression.) |
@wingman-jr-addon If the initialization is larger than before, there are possibly two causes:
Item 2 might be browser specific, can you help to verify if this behavior occurs on firefox and chrome web page? Thanks |
Thanks for idea to try @pyu10055 . I changed the plugin to run in a web page and then ran across Firefox 90 and Chrome 92. Here are the raw times in seconds: I reloaded the last Chrome test and tried it several more times, seeing reload times of 4.5-5.5 seconds. So, I guess I'm not sure what to make of the results for Chrome - I'm not sure if I should trust the reload results or if I should run a bunch of "first run" tests. For Firefox, the results are slower across the board but consistent on reload. Let me know what you think. Thanks! |
My guess is that Chrome have better caching on the shader across page reloads.
We have observe significant reduction of initial loading time. |
Thanks for the tip, I am seeing about a 0.5-1.0s reduction in load time on FF 90! But getting back to the matter at hand - any clue why we might be seeing such a performance gap on It could be that it just depends on hardware and mine is not the primary target, I'd just like to make sure there isn't some other issue hanging out. |
@wingman-jr-addon @pyu10055 My guess |
Cross-posting from #5205 I've tested this on my notebook with 3 different models of medium-high complexity
All-in-all:
As it is, I'll be setting Note: Chrome does extensive shader caching between sessions, so simple page reload is not sufficient and full browser restart is needed between tests |
@rthadur Would you agree the awaiting response label can probably be removed from this issue now? |
@ahmedsabie @qjia7 @rthadur @pyu10055 Any updates on this? As you can see, WEBGL_PACK_DEPTHWISECONV=True (which is default value) has a massive negative performance impact - and it's gotten far worse in newer versions of TFJS. This is a major regression and it has very little updates. And yes, using WEBGL_USE_SHAPES_UNIFORMS is much better, but - a) it's not a solution, it's an alternative, b) it's not widely implemented, c) almost nobody knows about it. |
see #5689 for fully reproducible code and additional performance notes. |
@vladmandic @wingman-jr-addon I think this could be caused by the packed depthwise conv2d shader could be much larger in size than unpacked depthwise conv2d version. This could be related to the filter size, since the packed version expand the loop of the filter width into code. Can you share what is the filter size for depthwise conv2d in your model? And the other question is, we have a way to make the initial warm non UI blocking, basically by yielding the JS thread and removing all GL block calls (parallel shader compilation). But the overall warm time might still be similar. Will this behavior be helpful for your use cases? |
I have seen the same behavior in almost every off-the-shelf model What is the intended benefit of the packed conv2d shader? I don't see much benefit of it:
Yes and No :)
|
We have seen significant performance gain on mobile device with the packed depthwise conv2d shader, especially for android devices. |
Thanks @pyu10055, Perhaps as a start, you could do conditional I just tried on Android. Yes, inference performance difference on Android is visible (unlike on desktop) Re: UI Blocking - true, if there is any GL usage elsewhere. Anyhow, if you can make it non-blocking, that is very much welcome And when will WEBGL_USE_SHAPES_UNIFORMS become a default? |
closing the loop after testing using todays code in main branch: warmup is now about 2x faster webgl default
webgl with uniforms enabled
@pyu10055 please consider enabling uniforms as default |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
TensorFlow.js installed from (npm or script link):
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.4.0/dist/tf.min.js
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm@3.4.0/dist/tf-backend-wasm.js
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm@3.4.0/dist/tfjs-backend-wasm.wasm
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm@3.4.0/dist/tfjs-backend-wasm-simd.wasm
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm@3.4.0/dist/tfjs-backend-wasm-threaded-simd.wasm
TensorFlow.js version (use command below): 3.4.0
Browser version: Firefox 90.0 64 bit
Tensorflow.js Converter Version: Unknown, but probably 2.7.0
Current behavior - Upgrading from 3.3.0 to 3.4.0 experienced major performance drop on load+first inference time. 3.3.0 sees times of about 8.8s, 3.4.0 sees times about 14.4s. It pains me to report a bug related to WEBGL_PACK as so much work has gone into this feature, but ... It appears that setting
WEBGL_PACK_DEPTHWISECONV=false
on 3.4.0 returns to performance found in 3.3.0. Regression with default flags has been found to exist in at least 3.6.0 and 3.8.0 as well. (This was found on a bisection to upgrade from 2.7.0 to 3.8.0 to get the new shader compilation performance improvements started in #5205 )Expected behavior - 3.4.0 with the flag default
WEBGL_PACK_DEPTHWISECONV=true
offers similar or better performance to 3.3.0.Minimal reproduction: wingman-jr-addon/wingman_jr#136
Note this is a Firefox plugin, but TF.js is loaded via a content tab rather in the background context so it should be acting quite similarly to a normal browsing context.
Attached is output from Firefox's about:support, which includes more detailed graphics issues that may be relevant to the matter at hand.
FF90_about_support.txt
The text was updated successfully, but these errors were encountered: