Replies: 3 comments 1 reply
-
Only if you are using b-splines of order 1 |
Beta Was this translation helpful? Give feedback.
-
It's equivalent to activating the same hidden state with multiple activation functions and then use a wider linear transformation to shrink it back. Somewhat like Gated Linear Unit, but somewhat in reverse: linear transformation goes after the broadening activation. |
Beta Was this translation helpful? Give feedback.
-
Hi. I also doubt fundamental difference between KANs and MLPs. Suppose input dimension is equal to output dimension, KAN is mathematically equivalent to MLP with residual connections (see the rightmost picture in the below diagram). Feel free to point out if I make some misunderstanding. P.S. |
Beta Was this translation helpful? Give feedback.
-
This note shows equivalence of KAN to MLP in the piecewise linear approximation. I guess non-linearity of spline might help in some cases, but would be cool to have it as a baseline. Here's the reddit discussion
Beta Was this translation helpful? Give feedback.
All reactions