Skip to content

Conversation

@ProGamerGov
Copy link
Contributor

@ProGamerGov ProGamerGov commented Dec 27, 2020

  • Optionally replace nonlinear MaxPool2d layers with their linear AvgPool2d equivalents. By removing nonlinear operations from models, the expanded weights can be used for direction objectives and other stuff. I also added a few sentences to the tutorial about removing nonlinear operations.

  • Optionally replace nonlinear ReLU / RedirectedReLU layers with empty layers that do nothing.

  • Added info for how to visualize expanded weights spatial positions in the weight vis tutorial.

  • Added info to tutorial for how to find and visualize top neuron connections by using expanded weights.

  • Improved some weight vis tutorial descriptions per Optim wip - General fixes, SharedImage, & Weight Visualization #543 (review)

  • I also added an expanded weights test which checks that removing nonlinear layers has the correct effect on expanded weights. This also lets me assert the output tensor values.

* Optionally replace non-linear MaxPool2d layers with their linear AvgPool2d equivalents.

* Added info for how to visualize expanded weights spatial positions in expanded weights / weight vis tutorial.
@ProGamerGov
Copy link
Contributor Author

@NarineK I tried to improve the descriptions in the weight vis tutorial, so it should hopefully be easier to understand and follow now. Let me know if any areas still need improvements!

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 5, 2021

@NarineK So, I was testing the crop function on expanded weights and I discovered that it has an issue with an output shape of (5,5) and an input shape of (14,14). This is a bit of a problem as the mixed4a, b, c, d, & e layers have a size of 14,14 with a size of (5,5) when the padding is cropped away. The center crop seems to work with literally every other size and output shape combo I tried, so I'm not sure what to do?

import torch
import torch.nn as nn
import torch.nn.functional as F

def crop_x(x, crop_list, pading_offset=True):
    h, w = x.shape
    h_crop = h - int(round((h - crop_list[0]) / 2.0))
    w_crop = w - int(round((w - crop_list[1]) / 2.0))
    return x[h_crop - crop_list[0] : h_crop, w_crop - crop_list[1] : w_crop]

px = F.pad(x, (3, 4, 3, 4), value=float("-inf"))
print(px.shape)
x_out = crop_x(px, [5, 5])
print(x_out)

px = F.pad(x, (4, 5, 4, 5), value=float("-inf"))
print(px.shape)
x_out = crop_x(px, [5, 5])
print(x_out)

px = F.pad(x, (5, 6, 5, 6), value=float("-inf"))
print(px.shape)
x_out = crop_x(px, [5, 5])
print(x_out)

px = F.pad(x, (11, 12, 11, 12), value=float("-inf"))
print(px.shape)
x_out = crop_x(px, [5, 5])
print(x_out)


# Weights and sizing normally wouldn't match this I think,
# so we could ignore this combo.
px = F.pad(x, (5, 4, 4, 5), value=float("-inf"))
print(px.shape)
x_out = crop_x(px, [5, 5])
print(x_out)

And the output of the above code:

torch.Size([12, 12])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
torch.Size([14, 14])
tensor([[1., 1., 1., 1., -inf],
        [1., 1., 1., 1., -inf],
        [1., 1., 1., 1., -inf],
        [1., 1., 1., 1., -inf],
        [-inf, -inf, -inf, -inf, -inf]])
torch.Size([16, 16])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
torch.Size([28, 28])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
torch.Size([14, 14])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [-inf, -inf, -inf, -inf, -inf]])

The issue is evident when visualizing the heatmap of the expanded weights between mixed4 layers and a crop shape of (5,5).

@NarineK
Copy link
Contributor

NarineK commented Jan 5, 2021

@ProGamerGov, that's interesting!
We can fix the first problem if we use math.ceil instead of round

    h_crop = h - int(math.ceil((h - crop_list[0]) / 2.0))
    w_crop = w - int(math.ceil((w - crop_list[1]) / 2.0))

Center cropping can be challenging in terms of on which side to include the endpoint. We could potentially make it as an argument to center crop function - such as center towards left or right.
What do you think ?

@ProGamerGov
Copy link
Contributor Author

@NarineK Thanks for figuring out the fix! And yeah I definitely think that we should make it so that users can choose with side to crop to! I think that would be as simple as optionally subtracting or adding one to the existing indices? Or were you thinking of a better way of doing it?

@NarineK
Copy link
Contributor

NarineK commented Jan 6, 2021

@NarineK Thanks for figuring out the fix! And yeah I definitely think that we should make it so that users can choose with side to crop to! I think that would be as simple as optionally subtracting or adding one to the existing indices? Or were you thinking of a better way of doing it?

yes, if the cropped sides are not equal. If they are equal and the cropping falls right into those equal sides then we want to make sure that we don't do unnecessary + /- 1 on one or other side.

@ProGamerGov
Copy link
Contributor Author

@NarineK I've added the new center crop parameter for dealing with unequal sides!

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 8, 2021

@NarineK I added this line to the tutorial to prevent the PyTorch user warnings from appearing:

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='torch')

Those lines should make it so that we don't have to remove the warnings every time we update a notebook.

@NarineK
Copy link
Contributor

NarineK commented Jan 9, 2021

@NarineK I added this line to the tutorial to prevent the PyTorch user warnings from appearing:

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='torch')

Those lines should make it so that we don't have to remove the warnings every time we update a notebook.

I see - it would be good that we don't alway hide the warnings when we run the notebook in case there are important warnings that we need to be aware of. That would make sense to use before we push the code into the codebase.

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 9, 2021

@NarineK Ah, okay! I've removed the import. I'll re-add and then remove it before uploading notebook updates in the future.

@ProGamerGov
Copy link
Contributor Author

@NarineK The multi-gpu test seems to be broken currently as @greentfrapp had the same errors.

Copy link
Contributor

@NarineK NarineK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this PR, @ProGamerGov! Looks great! Here are couple suggestions.

  1. It would be good that we add a bit more description in the beginning of the tutorial.
The first 1-2 sentences from Extracting expanded weights section could be a good fit there too. In addition to that describing a bit about NMF etc...
  2. visualize_activations and vis_multi share share code that I think could be combined together. Also, vis_multi is not a very clear name, maybe we can give a more clear name to it.
 Some of the args have type hint some not. It would be good to be consistent and have it everywhere or remove it everywhere.
  3. get_expanded_weights sounds a bit like a getter function but that function does more that a get. We could probably call it extract_expanded_weights or another name of your choice.
  4. Here you might also want to describe in the text what y and x are:[target2 output channels, target1 output channels, y, x]
. You can also describe that circuits are the grads of the target2 for the center activation w.r.t. target 1 etc ...
  5. vis_neuron_direction seems to be a copy of vis_multi for a different loss where channel is replaced with vec. We can combine these functions in one visualize_activations function or at least keep transformation as a sharable definition.
  6. For the section, Visualizing the spatial positions of expanded weights image, I think you can probably visualize the show(W_3a_3b_hm) next to it for easy comparison. 
Are we getting identical results what OpenAI has in their implementation ?
  7. Do you have documentation or citation on Weight banding ?
  8. In terms of Weight banding I see that you access the conv layer but not pooling. Don't we need to call circuits for W_5b_c5x5 and pooling layers ?
    Pooling layer related circuits you are getting in the below cell: W_p2_3a = optimviz ...
    but that's not used in the cell above ...
    I might be missing something here
  9. I think that we can abstract out this piece in a function and reuse it:

       A = []
       for i in highlow_units:
           x_out = vis_multi(model, model.conv3, i)
           A.append(x_out.detach())

      grid_img = torchvision.utils.make_grid(torch.cat(A), nrow=5)
     show(grid_img)
  1. nit: Can we, please, also describe where are we getting the highlow_units and bw indices from ?
  2. reducer.components are we setting the components in ChannelReducer ?
https://github.com/pytorch/captum/blob/ba076856d14a4c85f5903e371b272ac20b293c07/captum/optim/_utils/reducer.py#L17
  3. In the section Multiple related neurons with a small number of factors I'd add more documentation describing the components and the semantics. Also couple more sentences about the highlights of the visualizations.

* Also added a missing type hint & updated citation.
@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 25, 2021

@NarineK

  1. Currently working on improving the descriptions based on your feedback, though it might be better to do it in a future PR as I can't think of how to word it at the moment.

  2. visualize_activations (now called vis_neuron_large) only shares two lines of code plus two transforms with vis_multi (now known as vis_neuron), so it could be better to just leave them as separate. I've also added type hints!

  3. The Lucid version of get_expanded_weights is also called get_expanded_weights, but I can change the name if you think that a different name would be better.

  4. I've replaced y & x with height & width. Also, can you elaborate more on that explanation of circuits?

  5. I've merged vis_neuron_direction and vis_multi into a new function called vis_neuron.

  6. I've added show(W_3a_3b_hm) to the spatial visualization cell. And the weights should be the same as the Lucid / OpenAI version.

  7. I've added info about the upcoming weight banding paper.

  8. For weight banding we are looking directly at the non expanded weights. I've added some words that clarify this to the tutorial.

  9. Okay, I've turned it into a function that's reused across the tutorial!

  10. I've added links to where the neurons come from.

  11. The reducer.components variable is a factorization matrix created during decomposition when the channel reducer is run: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html

  12. I've improved the documentation, so hopefully everything makes more sense now!

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 25, 2021

@NarineK If there are no more major issues with the notebook then you can merge this PR whenever you want!

@NarineK
Copy link
Contributor

NarineK commented Jan 26, 2021

@NarineK

  1. Currently working on improving the descriptions based on your feedback, though it might be better to do it in a future PR as I can't think of how to word it at the moment.
  2. visualize_activations (now called vis_neuron_large) only shares two lines of code plus two transforms with vis_multi (now known as vis_neuron), so it could be better to just leave them as separate. I've also added type hints!
  3. The Lucid version of get_expanded_weights is also called get_expanded_weights, but I can change the name if you think that a different name would be better.
  4. I've replaced y & x with height & width. Also, can you elaborate more on that explanation of circuits?
  5. I've merged vis_neuron_direction and vis_multi into a new function called vis_neuron.
  6. I've added show(W_3a_3b_hm) to the spatial visualization cell. And the weights should be the same as the Lucid / OpenAI version.
  7. I've added info about the upcoming weight banding paper.
  8. For weight banding we are looking directly at the non expanded weights. I've added some words that clarify this to the tutorial.
  9. Okay, I've turned it into a function that's reused across the tutorial!
  10. I've added links to where the neurons come from.
  11. The reducer.components variable is a factorization matrix created during decomposition when the channel reducer is run: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
  12. I've improved the documentation, so hopefully everything makes more sense now!

Thank you for addressing all comments @ProGamerGov!

  1. Regarding 1: Sure, we can do it in a separate PR. I was thinking that you can summarize what you are doing in the notebook in the top of the notebook. Something like as abstract.
  2. I understand that you wanted to be consistent with lucid but I think that from engineering perspective extract_expanded_weights or compute_expanded_weights is more clear unless you want to have it identical with lucid then I'm fine with it too.

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 26, 2021

@NarineK

  1. I've added a basic summary what we are doing in the notebook!

  2. Okay, I've changed get_expanded_weights to extract_expanded_weights!

  3. I've added the improved output printing for the top channels!

  4. I'm not quite sure, but the current version of the test seems to work? We can change it in a future PR if need be.

@NarineK
Copy link
Contributor

NarineK commented Jan 26, 2021

@NarineK

  1. I've added a basic summary what we are doing in the notebook!
  2. Okay, I've changed get_expanded_weights to extract_expanded_weights!
  3. I've added the improved output printing for the top channels!
  4. I'm not quite sure, but the current version of the test seems to work? We can change it in a future PR if need be.

Thank you for addressing the comments, @ProGamerGov ! Regarding 4. did you try to add return statement and it failed ? I think that currently there is a bug in the recursion and it might be working that now but if we want to use the function in a more general context we will see issues.

@ProGamerGov
Copy link
Contributor Author

@NarineK I'm not quite sure what you mean? There are the two return statements, and stops when it finds the target instance type:

def _check_layer_in_model(
    self,
    model: torch.nn.Module,
    layer: Type[torch.nn.Module],
) -> None:
    def check_for_layer_in_model(model, layer) -> bool:
        for name, child in model._modules.items():
            if child is not None:
                if isinstance(child, layer):
                    return True
                check_for_layer_in_model(child, layer)
        return False

    self.assertTrue(check_for_layer_in_model(model, layer))

@NarineK
Copy link
Contributor

NarineK commented Jan 26, 2021

@NarineK I'm not quite sure what you mean? There are the two return statements, and stops when it finds the target instance type:

def _check_layer_in_model(
    self,
    model: torch.nn.Module,
    layer: Type[torch.nn.Module],
) -> None:
    def check_for_layer_in_model(model, layer) -> bool:
        for name, child in model._modules.items():
            if child is not None:
                if isinstance(child, layer):
                    return True
                check_for_layer_in_model(child, layer)
        return False

    self.assertTrue(check_for_layer_in_model(model, layer))

@ProGamerGov, I meant a return here: return check_for_layer_in_model(child, layer, in_model)
Adding a return in front of check_for_layer_in_model(child, layer, in_model)

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 26, 2021

@NarineK Ah, thank you! I made the change and the return causes the test to fail.

tests/optim/models/test_models.py:30: in _check_layer_in_model
    self.assertTrue(check_for_layer_in_model(model, layer))
E   AssertionError: False is not true

@ProGamerGov
Copy link
Contributor Author

@NarineK I added the GPU test fix, and now the GPU test is working again!

@NarineK NarineK merged commit 99f872c into meta-pytorch:optim-wip Jan 26, 2021
@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 29, 2021

@NarineK I tested the _check_layer_in_model function and it does go through every layer in the model in it's current form. So, it looks like it's able to reach the inner modules.

@NarineK
Copy link
Contributor

NarineK commented Jan 30, 2021

@NarineK I tested the _check_layer_in_model function and it does go through every layer in the model in it's current form. So, it looks like it's able to reach the inner modules.

Thank you for testing it @ProGamerGov !
It fails for example in this use case:

        class SubModule(nn.Module):
            def __init__(self):
                super().__init__()
                self.r = nn.ReLU()

            def forward(self, x):
                return self.r(x)

        class MyModel(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self.l1 = nn.Linear(3, 4)
                self.sub = SubModule()
            def forward(self, x):
                return self.l1(self.sub(x))

Here: _check_layer_in_model(self, MyModel(), nn.ReLU) fails.
It will happen if we have custom submodules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants