-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor interpretability notebook, add comments, and add pure Jacobian formulation #21
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -1,5 +1,16 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any references on interpreting NNs for regression using Saliency maps?
I have not carefully looked through the references here, but reading the abstracts they all seem to be for classification problems. Maybe the extension to regression is trivial, but I (being a ML novice) am finding this notebook slightly hard to follow - put another way I am not sure what to do with the gradient and LRP plots at the end, or what insight is being actually derived there.
I tried to go back and listen to Pierre's talk and that didn't really help me either. Is there a very simple intro level reading that can be attached here?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- I'll look for one. As elaborated in my comment below, the right way to think about gradients is as the local linear approximation of the model (i.e. the linear model that best approximates the NN at a given point, where in our plots we've averaged this model over 200 points). The right way to think about LRP or gradient * input is like that same linear model multiplied by the input, so it's giving the actual contributing evidence of each input to the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interpretation might actually be easier/more sensical for regression than classification, actually, because in classification you have to take one additional step and relate the thing being output by the model (the logits / log-odds) back to the thing you actually care about (the probabilities).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bit more explanation to the notebook, let me know if this is ok!
@@ -1,5 +1,16 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the first time I am seeing this, I am not 100% sure what equations are being referred to here. Do you mean the L-96 equations?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on further reflection, my comment doesn't make sense. What I was remembering, though, was the fact that when I trained a linear model to predict subgrid forcing (and when Anastasia did), it learned forcing prediction coefficients for all X_i
which were equal and about -0.8 (or alternatively, changing the corresponding differential equation weight from -1 to -1.8 in Anastasia's SINDY case).
The fact that a neural network's input gradients are all about -0.8 makes a lot of sense given those results (average local linear approximation of NN over many points = linear model, approximately), but I can't relate that back to the original equations, since we don't have any actual equations for the subgrid forcing in terms of the large-scale variables. Need to think about how to describe that succinctly in the notebook (+ with a reference to something public, since we can't link to m2lines presentations), let me know if you have suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In lieu of a satisfactory physical explanation, I updated the notebook to actually train and visualize the linear model for comparison!
I have reconciled this PR with the main branch. Were the discussions above resolved, or did @asross want to make any more changes in response to @dhruvbalwada's review before we merge. |
Updated the PR, and also fixed a few issues that prevented it from running in the restructured repository. I also changed the name since it's not really just about LRP! |
|
Per #12, I've reviewed the notebook on interpretability, and actually made some refactors and improvements:
torch.autograd
(comparing it to the previous finite-difference approximation, which also had a small bug with calculating the perturbation).See the updated notebook for more details.