Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDNN][INT8][FP32] INT8/FP32 elementwise_mul and INT8 elementwise_add #24812

Closed
wants to merge 11 commits into from

Conversation

Sand3r-
Copy link
Contributor

@Sand3r- Sand3r- commented May 29, 2020

PR types: New features

PR changes: OPs

Describe:

  1. PR introduces oneDNN forward kernels for:
  • INT8 elementwise_add
  • INT8 elementwise_mul
  • FP32 elementwise_mul
  1. It also fixes 2 bugs:
  • When using ShareBufferWith (inplace operators), the type wasn't assigned to the tensor to which the buffer was shared with. This resulted in hard to detect memory crashes when attempting to test inplace elementwise int8 operators (size was allocated in ints, while fetch_op required size allocated for floats).
  • OpTester has can now succesfully output uint8 results to the console. Previously arrays of uint8 data were treated as encoded byte strings instead of arrays of numbers.
  1. At the same time, the following has been removed:
  • An old elementwise_mul kernel that was crafted for a special case in ResNeXt-50. This special case (and many others) should now be fully covered inside of the new kernel
  • It's unit tests

PRs to be merged before this one:

#24853
#24854
#24855

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Sand3r- Sand3r- requested review from jczaja and wojtuss May 29, 2020 13:22
@Sand3r-
Copy link
Contributor Author

Sand3r- commented Jun 1, 2020

  • MAC_PR_CI_python35 (Paddle Mac) -> Fails due to missing approval for removal of UT. UT test_elementwise_mul_op_correct_dims was removed because its no longer relevant (old elementwise_mul kernel was removed).
  • PR_CI_CPU_Py2 (Compile) -> Fails due to missing approvals for:
  1. Change in API - I don't know which one is meant here, maybe addition of int8 attributes to elementwise_op?
  2. Change in tensor.h (fixed bug with ShareBufferWith)
  3. Setting dygraph field to false in int8 UTs, just the way it is in original elementwise_mul test, otherwise the oneDNN tests fail with dygraph
  4. Usage of skip_check_grad_ci for int8 inference-only ops.
  • PR_Windows_CI (Paddle Windows)
[14:33:59]W:	 [Step 7/8] CMake Error at cmake/inference_lib.cmake:217 (add_dependencies):
[14:33:59]W:	 [Step 7/8]   The dependency target "cuda_error_proto" of target "fluid_lib_dist" does
[14:33:59]W:	 [Step 7/8]   not exist.
[14:33:59]W:	 [Step 7/8] Call Stack (most recent call first):
[14:33:59]W:	 [Step 7/8]   CMakeLists.txt:211 (include)

No idea what happened, didn't touch anything there.

  • PR-CI-Coverage:
[2020-04-03 21:24:05]  [  3%] Building CXX object paddle/fluid/framework/io/CMakeFiles/shell.dir/shell.cc.o
[2020-04-03 21:24:05]  [  3%] Linking CXX static library libfs.a
[2020-04-03 21:24:05]  In file included from /paddle/paddle/fluid/platform/enforce.h:41:0,
[2020-04-03 21:24:05]                   from /paddle/paddle/fluid/framework/io/shell.cc:16:
[2020-04-03 21:24:05]  /paddle/paddle/fluid/platform/errors.h:23:50: fatal error: paddle/fluid/platform/error_codes.pb.h: No such file or directory
[2020-04-03 21:24:05]  #include "paddle/fluid/platform/error_codes.pb.h"
[2020-04-03 21:24:05]                                                    ^
[2020-04-03 21:24:05]  compilation terminated.

Again, no idea what went wrong there, didn't touch the file.

@luotao1 do you know if there are any ongoing fixes for PR-CI-Coverage and PR_Windows_CI (Paddle Windows)?

@jczaja @wojtuss you can start the review, as no tests currently fail, CIs only fail because of missing approvals, or random PR-unrelated compilation failures.

@luotao1
Copy link
Contributor

luotao1 commented Jun 1, 2020

PR-CI-Coverage and PR_Windows_CI fails

fixed by #24800, just rerun it, and I have already rerun Windows.

Change in API - I don't know which one is meant here, maybe the addition of int8 attributes to elementwise_op?

It seems not the addition of int8 attributes, I have rerun it at first.

@wojtuss wojtuss added the Intel label Jun 1, 2020
@Sand3r- Sand3r- added the int8 label Jun 1, 2020
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Jun 2, 2020

Could you split this PR into several ones? The reason is:

  • There are too many approvals in this PR.
  • Change in API - I don't know which one is meant here, maybe addition of int8 attributes to elementwise_op?
    • After rerun, it still has the Change in API. If you can split the PR into smaller ones, it may be easier for us to find the reason.

@Sand3r-
Copy link
Contributor Author

Sand3r- commented Jun 2, 2020

@luotao1

Could you split this PR into several ones? The reason is:

  • There are too many approvals in this PR.

  • Change in API - I don't know which one is meant here, maybe addition of int8 attributes to elementwise_op?

    • After rerun, it still has the Change in API. If you can split the PR into smaller ones, it may be easier for us to find the reason.

Following files are strictly connected to one another and concern addition of elementwise_kernels:

 paddle/fluid/operators/elementwise/CMakeLists.txt 
 paddle/fluid/operators/elementwise/elementwise_mul_op.h 
 paddle/fluid/operators/elementwise/elementwise_op.h 
 paddle/fluid/operators/elementwise/mkldnn/elementwise_add_mkldnn_op.cc 
 paddle/fluid/operators/elementwise/mkldnn/elementwise_mkldnn_op.h
 paddle/fluid/operators/elementwise/mkldnn/elementwise_mul_mkldnn_op.cc
 paddle/fluid/operators/elementwise/test_elementwise_mul_op_dim.cc
 paddle/fluid/platform/mkldnn_reuse.h
 python/paddle/fluid/tests/unittests/mkldnn/test_elementwise_add_mkldnn_op.py 
 python/paddle/fluid/tests/unittests/mkldnn/test_elementwise_mul_mkldnn_op.py 

This is a bugfix required for int8 tests to pass

 python/paddle/fluid/tests/unittests/op_test.py

This is a bugfix required for int8 inplace tests to pass

 paddle/fluid/framework/tensor.h 

Do you want me to split this into 3 such PRs?

@luotao1
Copy link
Contributor

luotao1 commented Jun 2, 2020

It's better to split this into 3 such PRs. Besides, how about remove test_elementwise_mul_op_dim.cc in a separate PR, which is easy for QA review?

@Sand3r-
Copy link
Contributor Author

Sand3r- commented Jun 2, 2020

@luotao1

It's better to split this into 3 such PRs. Besides, how about remove test_elementwise_mul_op_dim.cc in a separate PR, which is easy for QA review?
Alright, will do.

@Sand3r-
Copy link
Contributor Author

Sand3r- commented Jun 2, 2020

@luotao1 Done:
#24853
#24854
And the current one.

Btw. removing test_elementwise_mul_op_dim.cc separately might cause PR_CI_Coverage to fail. Are you sure we should move it to separate PR? Also: Maintaining so many PRs is going to be quite of a hassle.

@luotao1
Copy link
Contributor

luotao1 commented Jun 2, 2020

removing test_elementwise_mul_op_dim.cc separately might cause PR_CI_Coverage to fail.

The fail reason for PR_CI_Coverage is an approval. Is any other fail reason?

Are you sure we should move it to separate PR?

Yes, it's better to move it to separate PR, which is easy for QA review. And your current PR is much more clearer.

Maintaining so many PRs is going to be quite of a hassle.

I see the other two PRs, which are quick to be merged. But if they are in one PR, the approval may take some time.

@Sand3r-
Copy link
Contributor Author

Sand3r- commented Jun 2, 2020

Yes, it's better to move it to separate PR, which is easy for QA review. And your current PR is much more clearer.

Alright, I moved it to #24855

I see the other two PRs, which are quick to be merged. But if they are in one PR, the approval may take some time.

That's assuming no random CI failures 😉

@jczaja
Copy link
Contributor

jczaja commented Jun 9, 2020

Work on this functionality is continued in PR #24984. Closing

@jczaja jczaja closed this Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants