Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement leaky_relu #107

Conversation

FrostyTheSouthernSnowman
Copy link
Contributor

@FrostyTheSouthernSnowman FrostyTheSouthernSnowman commented May 16, 2024

The README mentioned that more activation functions are on the road map. I have gotten started. As of publishing, I only added Threshold, but I plan to go down the list from the (PyTorch Docs)[https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions]. Since I'm just going down the list, I'll do HardTanh next, unless someone has one they want written first. I am willing to write documentation if that is needed. Otherwise, I am trying to keep everything as close to the PyTorch docs as possible. If anybody thinks I should follow Tensoflow, Jax, or something else's docs instead, I am totally up for it as well. Here's the current TODO list for what I plan to implement first:

  • threshold
  • hardtanh
  • hardswish
  • selu
  • celu
  • leaky_relu
  • gelu
  • softsign
  • softplus
  • softmin
  • hardsigmoid
  • mish
  • normalize

Once I get all of those done, I'll do the rest of the ones listed on the PyTorch docs, assuming my schedule allows for it. I am more than happy to switch out anything on the list.

Also, I've noticed that the current AttributeVector system lead to a lot of type conversions, which I feel could hinder performance. I don't totally plan to look into it right now, although I could if it were necessary.

There also seem to be a lot of overloads in the test file. Once I implement a few more activation functions, I'd like to see if I can split it into just a few basic categories that can be widely used across all the different activation functions.

@andresnowak
Copy link
Collaborator

andresnowak commented May 19, 2024

For the activation functions the tests for the backward part and also forward (apart from test_activations) would be in tests/mojo/test_mlops and tests/python/test_mlops_torch. And I wouldn't say there are activation function to prioritize, but i would say the only necessary for now would be leaky_relu, gelu and selu (from the ones mentioned i think those are the most used), but I think (not sure) the non approximation (the non tanh) version of gelu is complicated to implement. Because maybe we have to think about how to divide some parts of the code base, so some cleanups will be necessary. So I think with only those function would be enough I think.

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Ok. Will work on leaky_relu and selu, and I'll do gelu last.

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Also, I was looking through the code and I realized that there are tests in test_mlops.mojo for the activation functions that I forgot to write. Will write those for threshold, hard_tanh, and leaky_relu, as well as the tests in test_activations

@andresnowak
Copy link
Collaborator

andresnowak commented May 22, 2024

There are also tests in tests/python/test_mlops_torch.mojo. Those are the three places test_activations, test_mlops and test_mlops_torch

@FrostyTheSouthernSnowman
Copy link
Contributor Author

The torch compatibility tests for threshold and hard tanh show there to be some bugs. Are they important enough to be worth debugging or should I just continue to gelu and selu and just delete all the threshold and hardtanh code?

@andresnowak
Copy link
Collaborator

If there are errors when comparing with the torch version yeah they should be fixed. But if you want yeah you can delete them and work with gelu, selu or you can fix the hard thanh and threshold.

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Ok. Haven't ever heard of them being used anyways. I'll just get rid of that code and work on the gelu and selu

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Is there an OP for elementwise min or max? Like what is denoted mathematically by say min(0, x)? Would be useful in a few places. And could simplify some implementations.

@andresnowak
Copy link
Collaborator

andresnowak commented Jun 7, 2024

yeah in autograd/ops/basics there is the max op, or in utils/tensorutils there is the reduce op (only reduce all or over one dimension)

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Doesn't the current max op get the max value in the tensor? I need something that gets the max or min between two values.

@andresnowak
Copy link
Collaborator

andresnowak commented Jun 7, 2024

that is part of mojo, there is already a max op in mojo in the math module (that gets the max value or min the min value between two simd values)

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Thanks! I'll use that then

@FrostyTheSouthernSnowman FrostyTheSouthernSnowman changed the title Implement More Activation Functions Implement leaky_relu Jun 28, 2024
@FrostyTheSouthernSnowman
Copy link
Contributor Author

Unfortunately I'm busier now than anticipated and I won't be able to complete the other activation functions. I do have full testing however for leaky_relu and I'd like to at least merge that in if possible.

@FrostyTheSouthernSnowman FrostyTheSouthernSnowman marked this pull request as ready for review June 28, 2024 13:41
Copy link
Collaborator

@andresnowak andresnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good

@andresnowak andresnowak merged commit cacd8cc into basalt-org:main Jun 28, 2024
@andresnowak
Copy link
Collaborator

andresnowak commented Jun 29, 2024

And also thank you for the contribution (forgot to say, sorry)

@FrostyTheSouthernSnowman
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants