Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[RFC] Custom Operator Part 2 #17006

Open
samskalicky opened this issue Dec 7, 2019 · 7 comments
Open

[RFC] Custom Operator Part 2 #17006

samskalicky opened this issue Dec 7, 2019 · 7 comments
Labels
Feature request RFC Post requesting for comments

Comments

@samskalicky
Copy link
Contributor

samskalicky commented Dec 7, 2019

Description

Request for comments on the next PR for enhancing custom operator support

References

@wkcn
Copy link
Member

wkcn commented Dec 8, 2019

Hi @samskalicky , thank you for the contribution!
I have several suggestions.

  • custom GPU operators
    1. Provide CUDA stream in OpResource.
    2. Share the same function on CPU and GPU.
      Users can discriminate the context by MXTensor::dltensor::ctx
  • Call framework specific math helper
    It is important for a custom operator. Users may call gemm, even convolution op in custom op.

Thanks.

@rondogency
Copy link
Contributor

Need to include a fix for the test error #15921 (review)

@szha szha added the RFC Post requesting for comments label Dec 15, 2019
@larroy
Copy link
Contributor

larroy commented Dec 26, 2019

@wkcn could you explain your suggestion? calling gemm back into the framework which gets dispatched to GPU or CPU?

@samskalicky
Copy link
Contributor Author

We should create a namespace for the stuff in the lib_api.h file as suggested by @larroy:
https://github.com/apache/incubator-mxnet/pull/15760/files#r311756416

@wkcn
Copy link
Member

wkcn commented Dec 29, 2019

@larroy Users may need matrix operators and DNN Op(e.g. ReLU, Conv) when writing a custom Op. Although they can implement it by third-party libraries, it is more convenient to use the built-in functions in MXNet.

@ptrendx
Copy link
Member

ptrendx commented Apr 20, 2020

Custom ops should be able to set the inplace property.

@kpuatamazon
Copy link
Contributor

Speed. All those std::string and std::unordered_map objects don't come cheaply.

I compared an integrated fork with a custom operator.

https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm integrated version end-to-end Sockeye performance (based on 1.6.0):

real	2m57.962s
user	7m3.986s
sys	0m6.724s

Custom operator version (based on 1.7.x. because it had to be for custom operators):

real	3m16.879s
user	7m43.727s
sys	0m8.273s

Conditions:
unset MXNET_ENGINE_TYPE; export OMP_NUM_THREADS=2; numactl -C 0-7 translate.sh
Both were compiled with the MKL backend hack for the remaining fp32 operations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature request RFC Post requesting for comments
Projects
None yet
Development

No branches or pull requests

7 participants