Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mkldnn related unit-tests #15615

Closed
luotao1 opened this issue Jan 31, 2019 · 28 comments
Closed

Add mkldnn related unit-tests #15615

luotao1 opened this issue Jan 31, 2019 · 28 comments
Assignees
Labels

Comments

@luotao1
Copy link
Contributor

luotao1 commented Jan 31, 2019

To improve the code (line+function) coverage, please add mkldnn related unit-tests:
http://ce.paddlepaddle.org:8080/viewLog.html?buildId=21110&buildTypeId=PaddlePaddleFramework_TestCoverage_TestCoverage&tab=report_project15_C___Coverage

  • framework/ir/mkldnn
    image
  • operators/elementwise/mkldnn
    image
    image
    image
    image
    image
    image
@kbinias
Copy link
Contributor

kbinias commented Feb 7, 2019

We reproduced your results with coverage testing tool.

  • cmake .. -DWITH_GPU=OFF -DWITH_PROFILER=ON -DWITH_STYLE_CHECK=OFF -DWITH_MKLDNN=ON -DWITH_TESTING=ON -DWITH_COVERAGE=ON
  • lcov --capture --directory . --output-file coverage.info
  • genhtml coverage.info --output-directory out

We will add missing tests.

image
image
image

@kbinias
Copy link
Contributor

kbinias commented Feb 15, 2019

@luotao1 Whether it is possible to run more than 1 iteration in a single unit test for forward and backward phases ? We need to cover the part of the code responsible for reuse MKL-DNN primitives in backward (e.g. activation_mkldnn_op.cc::eltwise_grad function line 276-287). We have to run more than one iteration to enter this section of code.

@luotao1
Copy link
Contributor Author

luotao1 commented Feb 15, 2019

Whether it is possible to run more than 1 iteration in a single unit test for forward and backward phases?

I think it's OK to run more than 1 iterations in elementwise_xxx_mkldnn unit-tests. However, if you add many more iterations, does the elasped time of unit-tests become larger?

@kbinias
Copy link
Contributor

kbinias commented Feb 15, 2019

Not many, 2 iterations will be enough. Is there any unit test in PaddlePaddle which more than 1 iteration is started ?

@luotao1
Copy link
Contributor Author

luotao1 commented Feb 18, 2019

@kbinias There is no 2 iterations unitest using OpTest. Maybe you should implement yourselves, and refer to test_batch_norm_op.py.

@kbinias
Copy link
Contributor

kbinias commented Feb 18, 2019

@luotao1 Thanks for you answer. What do you think about creating new test based on test_imperative_resnet.py with use_mkldnn equals to True ? It will cover batch_norm, conv, pool and activations.

@luotao1
Copy link
Contributor Author

luotao1 commented Feb 19, 2019

What do you think about creating a new test based on test_imperative_resnet.py with use_mkldnn equals to True?

@kbinias I think it's OK.

@kbinias
Copy link
Contributor

kbinias commented Mar 7, 2019

@luotao1 What do you think about closing this PR ? The problem with function coverage is IMO related to lcov tool.

image
image

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 7, 2019

The Functions coverage is less than 80% in operators/mkldnn. You can add the unit-tests for activation_mkldnn_op.cc, conv_mkldnn_op.cc and transpose_mkldnn_op.cc when you are convenient. @kbinias

@kbinias
Copy link
Contributor

kbinias commented Mar 13, 2019

@luotao1 Coverage feature in GCC doesn't support virtual destructors correctly. There is no way to cover tests in 100% if test uses Inheritance. You can find out more details below with good explanation:
https://stackoverflow.com/questions/25662174/is-there-a-way-to-call-the-deleting-destructor-of-a-pure-virtual-class.

One of potential workaround might be post-processing and remove all destructors from tests, we could consider that it’s a noise. The destructors should not be a part of final calculation.

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 13, 2019

Got it. Both @tensor-tang and I think destructors are normal.
However, the current coverage by May 12th is http://ce.paddlepaddle.org:8080/viewLog.html?buildId=31464&buildTypeId=PaddlePaddleFramework_TestCoverage_TestCoverage&tab=report_project15_C___Coverage
image
Why the coverage of batch_norm_mkldnn_op and lrn_mkldnn_op cut down that before?

@lidanqing-intel
Copy link
Contributor

lidanqing-intel commented Mar 14, 2019

我们的结果完全不一样,lrn应该是100%的,您可以用下面的编译指令重新生成一下吗 @luotao1 谢谢
⦁        cmake .. -DWITH_GPU=OFF -DWITH_PROFILER=ON -DWITH_STYLE_CHECK=OFF -DWITH_MKLDNN=ON -DWITH_TESTING=ON -DWITH_COVERAGE=ON 
⦁        make -j 12
⦁        ctest -R mkldnn
⦁        lcov --capture --directory . --output-file coverage.info 
⦁        genhtml --demangle-cpp coverage.info --output-directory out

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 14, 2019

我是CI上的结果。

@lidanqing-intel
Copy link
Contributor

Could you please tell how to reproduce it with commands ? It seems we have different procedure... 请问怎么reproduce呢,可以有什么指令吗? 非常感谢!

@lidanqing-intel
Copy link
Contributor

We are checking build log, hope we could find something. Thanks. If you have any further information, please inform us. Thanks

@pawelpiotrowicz
Copy link
Contributor

pawelpiotrowicz commented Mar 19, 2019

Macro REGISTER_OP_KERNEL is not able to set correct handle for all grad functions such TouchOpKernelRegistrar_transpose_grad_MKLDNN_DEFAULT_TYPE().
Due to missed registration, we can’t cover all functions in tests. The issue is globally, moreover practical working example how to register kernel and enforce touchopkernelregistrar_grad correctly doesn’t exist. The open question is - do we really should care about it?

@kbinias
Copy link
Contributor

kbinias commented Mar 19, 2019

@luotao1 What do you think about above problem ?

@panyx0718
Copy link
Contributor

do you mean paddle can't register some kernel or paddle can't select some registered kernels?

can you search REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE and see if it works for you?

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 20, 2019

@shanyi15 Could you help to update How to add new op to introduce so many different REGISTER MACRO?

@DaisyXten
Copy link

@luotao1 About batch_norm_mkldnn and lrn_mkldnn line coverages, I have some idea and made some fix locally. I could build and make with docker[latest-dev], but I don't know what is the building and testing script. Something like ./paddle_build_cov.sh. I also don't know what main cov means in the build log on TeamCity. Could you share build script with me? This will speed up the work. Thank you!

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 20, 2019

@lidanqing-intel The different between paddle_build_cov.sh adn paddle_build.sh is only in function main()

cov)
        cmake_gen ${PYTHON_ABI:-""}
        build
        run_test
        ;;

@kolinwei will update paddle_build.sh in next PR.

@lidanqing-intel
Copy link
Contributor

@lidanqing-intel The different between paddle_build_cov.sh adn paddle_build.sh is only in function main()

cov)
        cmake_gen ${PYTHON_ABI:-""}
        build
        run_test
        ;;

@kolinwei will update paddle_build.sh in next PR.

Thank you!

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 20, 2019

please see #16324

@pawelpiotrowicz
Copy link
Contributor

pawelpiotrowicz commented Mar 20, 2019

@panyx0718 @luotao1 @kbinias

The point it that Register_Op_kernel consist of Register_op_Kernel_with_custom_type.

#define REGISTER_OP_KERNEL(op_type, library_type, place_class, __VA_ARGS__)
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE( op_type, library_type, place_class, DEFAULT_TYPE, ::paddle::framework::OpKernelType::kDefaultCustomizedTypeValue, __VA_ARGS__)

To better understand what I’m trying to tell is a simple example below.
We have two execution on macro ( for instance transpose_mkldnn_op.cc) .

REGISTER_OP_KERNEL(transpose, MKLDNN, ::paddle::platform::CPUPlace,
                   ops::TransposeMKLDNNOpKernel<float>);
REGISTER_OP_KERNEL(transpose_grad, MKLDNN, ::paddle::platform::CPUPlace,
                   ops::TransposeMKLDNNGradOpKernel<float>);                   

So we got two functions, only first one will be executed – I expect both of them.

int TouchOpKernelRegistrar_transpose_MKLDNN_DEFAULT_TYPE() { __op_kernel_registrar_transpose_MKLDNN_DEFAULT_TYPE__ .Touch(); return 0; }
int TouchOpKernelRegistrar_transpose_grad_MKLDNN_DEFAULT_TYPE() { __op_kernel_registrar_transpose_grad_MKLDNN_DEFAULT_TYPE__ .Touch(); return 0; }

@lidanqing-intel
Copy link
Contributor

Hi, @luotao1 Both batch_norm_mkldnn and lrn_mkldnn failed because of timeout and it passed with good line coverage in our machines, without docker and tests are run in serial.

I just update the progress with you. And similar problems happened before so I put comment here.

I consider to set in cmake/generic.cmake, in line 394 and line 721
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 1000)

We also found the unit tests are run in parallel, while timeout limit is set 600 strictly. We consider setting mkldnn unit test to run in the serial way as in PR #16233 .

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 20, 2019

@lidanqing-intel GOT it, Thanks very much for your analysis!

it passed with good line coverage in our machines

Could you give the line coverage here?

@lidanqing-intel
Copy link
Contributor

As in krzysztof's comment.
line cov of batch_norm_mkldnn_op.cc 91%
line cov of lrn_mkldnn_op.cc 100%

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 20, 2019

@kbinias I close this issue due to #15615 (comment)
@pawelpiotrowicz For #15615 (comment), how about creating a new issue to discuss it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants