-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Simplified HybridBlock.forward commit made Sockeye 4% slower #18699
Comments
Can you share the benchmark script and point out where it uses the HybridBlock? How many forward / backward operations are run by the script? We need to identify if the regression is due to changes in frontend or backend. If it's in the frontend, the overhead might be from supporting both the deprecated and to be removed Python APIs as well as the updated API (ie. compatibility / MX1to MX2 transition mode). |
This benchmark was run using a trained Sockeye model. This is a full sequence-to-sequence model with HybridBlocks glued together by Python code. We're running inference (translating a test set with beam search), so there shouldn't be any backward operations. We can share a model with input data and a run script if that will help with debugging. Alternatively, do you have an idea of what good tests would be for determining frontend vs backend speed regressions or multi-API overhead? Are there build/runtime options for MXNet that we can try with our sample model? If we have a good idea of everything that changed in commit 83b5170, we can work through it by process of elimination. |
Do you run multiple inference passes in a single run (multiple forward) or only a single one?
The commit in question contains both changes in the Python frontend and in the C++ backend. To verify which part introduces an overhead, I recommend to partially apply the commit in question, discarding all Python changes and only applying the C++ changes. In more detail, for the Python frontend, the following changes in the commit of question may introduce a overhead for your use-case:
Note that these changes do not affect the |
I split commit 83b5170 into just C++ changes and pushed to this branch so you can see precisely what it looks like: https://github.com/kpuatamazon/incubator-mxnet/tree/split-hybrid . The C++ changes made it faster then the Python changes made it slower. Similar experimental setup.
|
Thank you for splitting up the commit and comparing the frontend to backend changes! To find out which part of the Python frontend changes is causing the overhead, I'd suggest to apply all changes from 83b5170 minus respectively a) multiarray.py and ndarray.py b) parameter.py c) block.py. My assumption is that the extra C API call in parameter.py causes the overhead and that variant b) would resolve the overhead. This works because Sockeye doesn't use |
Thanks for your help with this, Leonard! If this turns out to be the issue, are there any Sockeye-level changes that would avoid the extra overhead (forward vs hybrid_forward, etc.), or is this something that needs to be addressed in MXNet? |
@mjdenkowski when Sockeye adopts MXNet 2, you will need to rename from For verifying the hypothesis, one can simply revert the changes in |
3 runs each sorted by increasing time taken. All downgrade MKLDNN to cb2cc7a + force MKL backend hack.
|
Ran these in a loop over the weekend with 136 samples.
Running a 1-tailed t-test assuming the same variance, we can say that the new version of |
Thank you @kpuatamazon. I'll make sure to fix this regression before the stable release. One idea is to speed up the communication with the backend, ie by moving away from ctypes. As you mentioned that this is one of many changes causing regressions: if you would like to look at the following changes resulting in slowdown, you should be able to disregard the Python changes of 83b5170 (at least in parameter.py, multiarray.py, ndarray.py which shouldn't cause any conflicts due to the simplicity of the respective changes) For b), it's still at 260.640 vs 255.469 Min runtime of the C++ only changes.. I think this may be due to the |
Hi @leezu thanks for the insight! |
@fhieber Sorry for being unclear. I'm referring to the 5 seconds overhead that remains after reverting |
I'm experiencing a 4% slowdown in Sockeye due to commit 83b5170 "Add simplified HybridBlock.forward without F (#17530)".
But it's slightly more complicated. At the beginning (f7c432), the build worked with MKLDNN at cb2cc7ac. Then 3667e9a broke the build with an MKLDNN upgrade, a bunch of commits went in with MKLDNN broken so they don't compile, and 08528c5 fixed it by downgrading MKL back to cb2cc7ac.
Hence I wrote this script that downgrades MKLDNN to make stuff build and find the relevant commit:
Test conditions:
OMP_NUM_THREADS=3
export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe"
More broadly, I'm trying to unpick performance differences seen in Sockeye as MXNet has changed since v1.5.x. This image shows commits since master diverged from v1.5.x. v1.5.x is on the left and cbbb864 is on the right.
The first big slowdown is an MKLDNN change on the left but that appears to have been fixed. Then there's a slowdown near the right that doesn't appear to be a single commit but rather a bunch of incremental changes. And this is the first of them I've been able to isolate.
The text was updated successfully, but these errors were encountered: