-
Notifications
You must be signed in to change notification settings - Fork 615
add mish kernel #569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mish kernel #569
Conversation
WindQAQ
commented
Oct 5, 2019
•
edited
Loading
edited
- Address please add more activation functions #437 - mish kernel Mish: A Self Regularized Non-Monotonic Neural Activation Function
@digantamisra98 Hello Diganta, would you mind taking a look at this. Also, I would like to drop your name on the maintainer list if you agree. Thanks! |
Hey @WindQAQ. Going through it. As of now, everything looks fine. The arXiv identifier in your first comment is incorrect though, just to point out. |
Thanks for the review! Already updated the maintainer list. Please feel free to ping me if the mail address or the user id is wrong 😃 |
@WindQAQ The ID and email are correct. I would also encourage you to take a look at this repository (based on CUDA) - https://github.com/thomasbrandon/mish-cuda. |
@digantamisra98 Eigen's GPUDevice is also based on CUDA (and ROCm). My previous experiment on gelu activations shows that Eigen's parallelism has the same performance with customized CUDA kernel in TensorFlow as well as PyTorch.
Maybe next week. Thanks! |
My performance test on gelu: TFA (Eigen's parallelism) vs PyTorch. Will do a thorough test on TFA versus PyTorch on activation after all of these get merged :P |
Thanks for that notebook link. Now I have better clarity on the same. Keep me posted on the progress of the TFA/Torch tests. Thanks! |
As author of the PyTorch kernel linked by @digantamisra98 I'll just expand a little. |
Storing intermediate values is kind of trade-off, but as you noted, I will investigate it on the combination of device/type. For the stability, I am aware the softplus operation in the forward pass is very likely to underflow/overflow. Will see how you deal with the backward gradients too! Thank you so much for the suggestion! |
Yeah, the intermediate stuff was mainly as I gather that's what @digantamisra98 was referring to with "not sure, it is the right approach (Since it still doesn't work pretty well on double precision)". |
Hi @digantamisra98 @thomasbrandon, I have already dealt with precision problem. The softplus part is mainly copied from core TF and backward part is from @thomasbrandon implementation. Would you mind reviewing the PR again? Thank you so much for the time. |
@WindQAQ will check tomorrow and let you know. Thanks! |
@WindQAQ I did a quick check and was going through the log of the failed Ubuntu GPU build and couldn't interpret it correctly. Could you give it a bit of clarity? |
It seems to be the upstream's issue. |
@digantamisra98 would you mind taking a look again. The fail was fixed in upstream tensorflow |
@seanpmorgan yeah checked, all good. Thank you for notifying. |
@WindQAQ mind resolving conflicts when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the contriubtion @WindQAQ and @digantamisra98 for the review
To whom it may concern about the speed, I benchmark some activations in tensorflow/tensorflow-addons against pytorch's implementation. I am not testing xla and jit here :-) Because it's hard to build a toolchain on Colab, @thomasbrandon would you mind installing https://colab.research.google.com/drive/1zKuef-upkN_4jFnBRoHLk06xmtIDRemi |
@seanpmorgan is there any scope of moving Mish to TF and Keras since now it is included in PyTorch 1.9. Reference |
@digantamisra98 We don't handle anymore, on our side, migrations in the TF ecosystem as you can see in tensorflow/community#241 (comment). Also, as the keras-team/keras#13440 we don't know how the deprecation will be triggered/handled. The process isn't currently documented and my best bet is to collaboratively expand and improve the |