-
Notifications
You must be signed in to change notification settings - Fork 6.8k
unix-cpu MKL/MKL-DNN Test Time #18244
Comments
Thanks to raising this concern. we will take a look soon. |
I notice @szha Do you have any detailed report where we can find the test time for each test for test file? |
@TaoLv for the example link I included, you can find the log where top 50 most time consuming tests are listed for each run. |
Any update on this? |
* run operator tests with naive engine * fix take tests * update skip mark * fix cuda error reset * adjust tests * disable parallel testing and naive engine for mkl/mkldnn #18244
For now I disabled the MKL/MKLDNN parallel testing and naive engine change due to the excessive testing time these builds take. for testing, revert this commit: 9b7b7e2 |
@szha I've done some testing in docker for parallel tests and have some overview. For testing I've used about 35 long tests. I've tested time for MKLDNN built with set OMP_NUM_THREADS=n/4 (n - number of cores) I'm gonna do some testing with MKL and OMP flags and point the best configuration and also figure out 'serial' case cc: @TaoLv |
cmake build uses MKL by default if available. You can look at the cmake configuration output. To force using MKL, set If MKL Blas is used, OpenMP will not be built There may be a bug in these two reference, or your build environment isn't correct? |
I discovered that slowdown is caused by ENABLE_TESTCOVERAGE=ON flag not by OpenMP. With ENABLE_TESTCOVERAGE=1 & unset OMP_NUM_THREADS:
With ENABLE_TESTCOVERAGE=1 & OMP_NUM_THREADS=1:
Without ENABLE_TESTCOVERAGE & unset OMP_NUM_THREADS:
Summary:
|
Thank you for looking into this! Do you have any insights into why this only affects MKL builds? |
This affects CPU (openBLAS) build as well. With ENABLE_TESTCOVERAGE=1 & unset OMP_NUM_THREADS:
With ENABLE_TESTCOVERAGE=1 & OMP_NUM_THREADS=1:
Without ENABLE_TESTCOVERAGE & unset OMP_NUM_THREADS:
Without ENABLE_TESTCOVERAGE & OMP_NUM_THREADS=12:
Last test case shows that openBLAS build have some problems with utilizing all threads. |
* run operator tests with naive engine * fix take tests * update skip mark * fix cuda error reset * adjust tests * disable parallel testing and naive engine for mkl/mkldnn apache#18244
@szha Can we close this issue as root cause was found? |
@bgawrych yes. thank you for tracking this down. |
Description
Since #18146 we introduced parallel testing in CI with the hope of reducing test time. However, during that effort we noticed that the MKL and MKLDNN tests run slower than the setting without MKL or MKLDNN. This issue summarizes the set up and current time difference.
Setup
The results in this issue come from this CI run, and the time difference is similar in master branch validation.
To show the results, we compare the following test nodes.
Python 3: CPU
Build
Python 3: MKL-CPU
Build
Python 3: MKLDNN-CPU
Build
Python 3: MKLDNN-MKL-CPU
Build
Tests
Each of the test node runs one of the two following test functions
python3_ut
python_ut_mkldnn
Test steps
In order to show fine-grain time result, I break down each of the steps in the test nodes as the following:
pytest -m 'not serial' -n 4 --durations=50 --cov-report xml:tests_unittest.xml --verbose tests/python/unittest
pytest -m 'serial' --durations=50 --cov-report xml:tests_unittest.xml --cov-append --verbose tests/python/unittest
pytest -n 4 --durations=50 --cov-report xml:tests_quantization.xml --verbose tests/python/quantization
pytest -n 4 --durations=50 --cov-report xml:tests_mkl.xml --verbose tests/python/mkl
Results
The unit for the following results are seconds.
Python 3: CPU
results are considered baseline.The text was updated successfully, but these errors were encountered: