-
Notifications
You must be signed in to change notification settings - Fork 547
Optim-wip: Add model linearization, and expanded weights spatial positions #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optim-wip: Add model linearization, and expanded weights spatial positions #574
Conversation
* Optionally replace non-linear MaxPool2d layers with their linear AvgPool2d equivalents. * Added info for how to visualize expanded weights spatial positions in expanded weights / weight vis tutorial.
eefab69 to
336c62f
Compare
|
@NarineK I tried to improve the descriptions in the weight vis tutorial, so it should hopefully be easier to understand and follow now. Let me know if any areas still need improvements! |
bc0913f to
94fe929
Compare
|
@NarineK So, I was testing the crop function on expanded weights and I discovered that it has an issue with an output shape of (5,5) and an input shape of (14,14). This is a bit of a problem as the mixed4a, b, c, d, & e layers have a size of 14,14 with a size of (5,5) when the padding is cropped away. The center crop seems to work with literally every other size and output shape combo I tried, so I'm not sure what to do? And the output of the above code: The issue is evident when visualizing the heatmap of the expanded weights between mixed4 layers and a crop shape of (5,5). |
|
@ProGamerGov, that's interesting! Center cropping can be challenging in terms of on which side to include the endpoint. We could potentially make it as an argument to center crop function - such as center towards left or right. |
|
@NarineK Thanks for figuring out the fix! And yeah I definitely think that we should make it so that users can choose with side to crop to! I think that would be as simple as optionally subtracting or adding one to the existing indices? Or were you thinking of a better way of doing it? |
yes, if the cropped sides are not equal. If they are equal and the cropping falls right into those equal sides then we want to make sure that we don't do unnecessary + /- 1 on one or other side. |
699fbe2 to
2e86d98
Compare
|
@NarineK I've added the new center crop parameter for dealing with unequal sides! |
|
@NarineK I added this line to the tutorial to prevent the PyTorch user warnings from appearing: Those lines should make it so that we don't have to remove the warnings every time we update a notebook. |
I see - it would be good that we don't alway hide the warnings when we run the notebook in case there are important warnings that we need to be aware of. That would make sense to use before we push the code into the codebase. |
|
@NarineK Ah, okay! I've removed the import. I'll re-add and then remove it before uploading notebook updates in the future. |
|
@NarineK The multi-gpu test seems to be broken currently as @greentfrapp had the same errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this PR, @ProGamerGov! Looks great! Here are couple suggestions.
- It would be good that we add a bit more description in the beginning of the tutorial.
The first 1-2 sentences from
Extracting expanded weightssection could be a good fit there too. In addition to that describing a bit about NMF etc... visualize_activationsandvis_multi shareshare code that I think could be combined together. Also,vis_multiis not a very clear name, maybe we can give a more clear name to it. Some of the args have type hint some not. It would be good to be consistent and have it everywhere or remove it everywhere.get_expanded_weightssounds a bit like a getter function but that function does more that a get. We could probably call itextract_expanded_weightsor another name of your choice.- Here you might also want to describe in the text what y and x are:[target2 output channels, target1 output channels, y, x] . You can also describe that circuits are the grads of the target2 for the center activation w.r.t. target 1 etc ...
vis_neuron_directionseems to be a copy ofvis_multifor a different loss where channel is replaced with vec. We can combine these functions in onevisualize_activationsfunction or at least keep transformation as a sharable definition.- For the section,
Visualizing the spatial positions of expanded weights image, I think you can probably visualize the show(W_3a_3b_hm) next to it for easy comparison. Are we getting identical results what OpenAI has in their implementation ? - Do you have documentation or citation on
Weight banding? - In terms of Weight banding I see that you access the conv layer but not pooling. Don't we need to call circuits for W_5b_c5x5 and pooling layers ?
Pooling layer related circuits you are getting in the below cell: W_p2_3a = optimviz ...
but that's not used in the cell above ...
I might be missing something here - I think that we can abstract out this piece in a function and reuse it:
A = []
for i in highlow_units:
x_out = vis_multi(model, model.conv3, i)
A.append(x_out.detach())
grid_img = torchvision.utils.make_grid(torch.cat(A), nrow=5)
show(grid_img)
- nit: Can we, please, also describe where are we getting the
highlow_unitsandbwindices from ? reducer.componentsare we setting the components in ChannelReducer ? https://github.com/pytorch/captum/blob/ba076856d14a4c85f5903e371b272ac20b293c07/captum/optim/_utils/reducer.py#L17- In the section
Multiple related neurons with a small number of factorsI'd add more documentation describing the components and the semantics. Also couple more sentences about the highlights of the visualizations.
* Also added a missing type hint & updated citation.
|
|
@NarineK If there are no more major issues with the notebook then you can merge this PR whenever you want! |
Thank you for addressing all comments @ProGamerGov!
|
* Also made improvements to top channel section of the notebook.
|
Thank you for addressing the comments, @ProGamerGov ! Regarding 4. did you try to add return statement and it failed ? I think that currently there is a bug in the recursion and it might be working that now but if we want to use the function in a more general context we will see issues. |
|
@NarineK I'm not quite sure what you mean? There are the two return statements, and stops when it finds the target instance type: |
@ProGamerGov, I meant a return here: return check_for_layer_in_model(child, layer, in_model) |
|
@NarineK Ah, thank you! I made the change and the return causes the test to fail. |
|
@NarineK I added the GPU test fix, and now the GPU test is working again! |
|
@NarineK I tested the |
Thank you for testing it @ProGamerGov ! Here: |
Optionally replace nonlinear MaxPool2d layers with their linear AvgPool2d equivalents. By removing nonlinear operations from models, the expanded weights can be used for direction objectives and other stuff. I also added a few sentences to the tutorial about removing nonlinear operations.
Optionally replace nonlinear ReLU / RedirectedReLU layers with empty layers that do nothing.
Added info for how to visualize expanded weights spatial positions in the weight vis tutorial.
Added info to tutorial for how to find and visualize top neuron connections by using expanded weights.
Improved some weight vis tutorial descriptions per Optim wip - General fixes, SharedImage, & Weight Visualization #543 (review)
I also added an expanded weights test which checks that removing nonlinear layers has the correct effect on expanded weights. This also lets me assert the output tensor values.