-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement leaky_relu #107
Implement leaky_relu #107
Conversation
I implemented the threshold activation function as well as unit tests to verify its functionality.
For the activation functions the tests for the backward part and also forward (apart from test_activations) would be in tests/mojo/test_mlops and tests/python/test_mlops_torch. And I wouldn't say there are activation function to prioritize, but i would say the only necessary for now would be leaky_relu, gelu and selu (from the ones mentioned i think those are the most used), but I think (not sure) the non approximation (the non tanh) version of gelu is complicated to implement. Because maybe we have to think about how to divide some parts of the code base, so some cleanups will be necessary. So I think with only those function would be enough I think. |
Ok. Will work on leaky_relu and selu, and I'll do gelu last. |
Also, I was looking through the code and I realized that there are tests in |
There are also tests in tests/python/test_mlops_torch.mojo. Those are the three places test_activations, test_mlops and test_mlops_torch |
Next up: test_mlops and test_mlops_torch
The torch compatibility tests for threshold and hard tanh show there to be some bugs. Are they important enough to be worth debugging or should I just continue to gelu and selu and just delete all the threshold and hardtanh code? |
If there are errors when comparing with the torch version yeah they should be fixed. But if you want yeah you can delete them and work with gelu, selu or you can fix the hard thanh and threshold. |
Ok. Haven't ever heard of them being used anyways. I'll just get rid of that code and work on the gelu and selu |
Is there an OP for elementwise min or max? Like what is denoted mathematically by say min(0, x)? Would be useful in a few places. And could simplify some implementations. |
yeah in autograd/ops/basics there is the max op, or in utils/tensorutils there is the reduce op (only reduce all or over one dimension) |
Doesn't the current max op get the max value in the tensor? I need something that gets the max or min between two values. |
that is part of mojo, there is already a max op in mojo in the math module (that gets the max value or min the min value between two simd values) |
Thanks! I'll use that then |
Unfortunately I'm busier now than anticipated and I won't be able to complete the other activation functions. I do have full testing however for leaky_relu and I'd like to at least merge that in if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good
And also thank you for the contribution (forgot to say, sorry) |
Oh thank you! No worries. Sorry I couldn’t finish what I hoped to.On Jun 28, 2024, at 11:32 PM, Andres Nowak ***@***.***> wrote:
And also thank you for the contribution
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
The README mentioned that more activation functions are on the road map. I have gotten started. As of publishing, I only added Threshold, but I plan to go down the list from the (PyTorch Docs)[https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions]. Since I'm just going down the list, I'll do HardTanh next, unless someone has one they want written first. I am willing to write documentation if that is needed. Otherwise, I am trying to keep everything as close to the PyTorch docs as possible. If anybody thinks I should follow Tensoflow, Jax, or something else's docs instead, I am totally up for it as well. Here's the current TODO list for what I plan to implement first:
Once I get all of those done, I'll do the rest of the ones listed on the PyTorch docs, assuming my schedule allows for it. I am more than happy to switch out anything on the list.
Also, I've noticed that the current AttributeVector system lead to a lot of type conversions, which I feel could hinder performance. I don't totally plan to look into it right now, although I could if it were necessary.
There also seem to be a lot of overloads in the test file. Once I implement a few more activation functions, I'd like to see if I can split it into just a few basic categories that can be widely used across all the different activation functions.