Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add tt.nnet.softmax to pm.math? #4226

Closed
ricardoV94 opened this issue Nov 13, 2020 · 12 comments · Fixed by #5279
Closed

Suggestion: Add tt.nnet.softmax to pm.math? #4226

ricardoV94 opened this issue Nov 13, 2020 · 12 comments · Fixed by #5279

Comments

@ricardoV94
Copy link
Member

Since the invlogit is already available from pm.math, maybe it would make sense to add the softmax as well, since it is the generalization of the invlogit for more than 2 variables.

Personally, it's one of the most common functions I use, and many times the sole reason I have to import theano.tensor.

What do you think?

@AlexAndorra
Copy link
Contributor

Hi Ricardo,
I think it's a good idea -- I'm also biased because that's a function I often use. Would definitely be helpful to have easier access to it!
Do you feel like doing a PR to implement this?

@ricardoV94
Copy link
Member Author

ricardoV94 commented Nov 14, 2020

I can do the PR, but should we collect more opinions before doing it or is it fine?

@ericmjl
Copy link
Member

ericmjl commented Nov 15, 2020

Good call, @ricardoV94 -- I have had an experience before on NetworkX where my original PR of 3 functions was later shaved down to one, even though I originally had buy-in from the lead devs. I haven't gone back to that PR for a while now.

In any case, I vote in favor, let's see what the other devs say here.

@katosh
Copy link
Contributor

katosh commented Nov 20, 2020

I experienced issues with the vector support of tt.nnet.softmax when I wrote #4129 so I used did this instead:
https://github.com/pymc-devs/pymc3/blob/87f603bb4387c62336785dc97702a56637082a9a/pymc3/distributions/transforms.py#L466-L468

If you want to support vectors it may be worth considering for pm.math.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Nov 20, 2020

@katosh I actually noted this in the discussion in this PR: #4229, which led to the opening of this PR in the Theno-PyMC aesara-devs/aesara#183

It would be great to get your input since you have worked around these issues already. I am not sure whether it makes sense to:

  1. Modify the Theano implementation
  2. Simply drop the warning in the Theano implementation
  3. Implement a code similar to yours in pm.math and simply ignore the theano function.

@AlexAndorra
Copy link
Contributor

Hi @katosh and @ricardoV94, and thanks for pointing this out again. From the discussions we had internally and on the PR, it seems preferable to do options 1 or 2 -- and more probably 1, as it seems to extend the current Theano-PyMC implementation (@katosh, as you wrote it, feel free to jump in on that point).

That's why we're doing Theano-PyMC in the first place, and why we opened an issue on the Theano-PyMC repo (which is up for grabs BTW 😉 ) to deal with this.

That's also why we don't want to go with option 3: the pm.math module must be a simple wrapper that makes Theano functions more easily accessible to users, but those functions are still implemented in Theano, otherwise that'd be duplicate maintenance for core devs.

Hope this is clear enough 😅 And feel free to ask questions, especially if you wanna fix this on Theano-PyMC side!

@katosh
Copy link
Contributor

katosh commented Nov 20, 2020

Thank you @ricardoV94 for pointing out aesara-devs/aesara#183 and @AlexAndorra the forwarding of the issue with the softmax function! I did not notice this before.

I agree to go for 1 and really like the idea of @ricardoV94 to include the axis argument into tt.nnet.softmax. This should resolve any confusion about vectors and padding. Additionally, tt.nnet.softmax seems to behave as if axis=0. So we would hopefully not break anything by changing the theano function.

@AlexAndorra
Copy link
Contributor

I agree to go for 1 and really like the idea of @ricardoV94 to include the axis argument into tt.nnet.softmax. This should resolve any confusion about vectors and padding. Additionally, tt.nnet.softmax seems to behave as if axis=0. So we would hopefully not break anything by changing the theano function.

Totally agree @katosh ! Are you in to make a PR to fix this on Theano-PyMC?

@katosh
Copy link
Contributor

katosh commented Nov 20, 2020

I looked at the implementation and realized that I was wrong: tt.nnet.softmax behaves as if axis=1 and in case of a vector a (1,) is padded to the left of the shape. Including an axis argument and making 1 the default may seem awkward and changing it to 0 would break things. Additionally, understanding and fixing the implementation of the c_code_template seems to be quite a hustle (s. https://github.com/pymc-devs/Theano-PyMC/blob/master/theano/tensor/nnet/nnet.py). I may prefer solution 2 after all.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Nov 21, 2020

@katosh I had the same impression that it is not so trivial to just adjust the theano function.

That's why I suggested implementing a separate function in pm.math. Most other functions in that module are also implemented directly and not just imported from theano, and the softmax is not particularly complex.

Maybe this would be a bit less efficient than the theano function (this can be tested), but other than that I don't see the inconvenience. I think it is better than the current state of affairs in the codebase where people prefer to write the softmax multiple times on the spot instead of working with a common function.

It would also benefit users like me, who always have to think whether the input/output has to be transposed or not because there is no control over the axis of operation.

I would vote for option 3, at least until someone is willing to work on option 1.

@katosh
Copy link
Contributor

katosh commented Nov 21, 2020

It seems an axis argument can only easily be realized by 3 until someone resolves aesara-devs/aesara#183.

@ricardoV94
Copy link
Member Author

Closed via #5279

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants