Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooling layers #357

Merged
merged 5 commits into from
Feb 13, 2024
Merged

Pooling layers #357

merged 5 commits into from
Feb 13, 2024

Conversation

gboduljak
Copy link
Contributor

@gboduljak gboduljak commented Jan 4, 2024

Proposed changes

Added MaxPool and AvgPool layers. MaxPool and AvgPool layers were requested in this issue. In this PR, I propose implementing the requested pooling operations by firstly computing sliding windows and subsequently reducing them. More precisely, we can use as_strided to compute pooling sliding windows. Then, we can simply reduce over appropriate axes to implement the desired pooling operation.

Concerns: My only concern is performance. Ideally, we should call optimized backend kernels for pooling operations. There are MPSCNNPoolingAverageNode and MPSCNNPoolingMaxNode. Similarly, there is BNNSFilterCreateLayerPooling in Accelerate. Alternatively, we could implement a kernel for window reduction.

Closes #25.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@gboduljak
Copy link
Contributor Author

@awni @angeloskath Could you take a look at this draft implementation?

@gboduljak gboduljak force-pushed the pooling-layers branch 2 times, most recently from d27260f to a6a2012 Compare January 6, 2024 18:38
@gboduljak gboduljak marked this pull request as ready for review January 6, 2024 19:14
@gboduljak gboduljak changed the title A draft implementation of pooling layers An initial implementation of pooling layers Jan 6, 2024
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks great! I left a couple of comments regarding using strides in MLX arrays, basically they shouldn't be used unless implementing a Primitive.

As an additional comment, I think it is nice that the implementation is general but in this case it hinders readability quite a lot. I would rather see a simple Pooling1d or even MaxPooling1d and then refactor things out to a base class when it makes sense rather than directly a very general implementation and then possibly aliases.

One example where the generality is hurting readability is that it pushes all argument normalization into the logic because it depends on len(feature_sizes). Or it combines all the docs into a big one that is harder to read.

python/src/array.cpp Outdated Show resolved Hide resolved
+ feature_strides
+ [channels_stride]
)
windows = mx.as_strided(a, windows_shape, windows_strides)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MLX's as_strided the provided shape, strides and offset should be provided as if the original array is row contiguous. This is also why we don't need access to the original strides.

Copy link
Contributor Author

@gboduljak gboduljak Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we cannot expose array.strides(), how would you access them to computewindows_strides?
Given the convention that strides should be provided as if the original array is row contiguous, we could manually compute strides based on the shape (e.g. reverse cumulative product of the shape). Is there a better way?

python/mlx/nn/layers/pooling.py Outdated Show resolved Hide resolved
python/mlx/nn/layers/pooling.py Outdated Show resolved Hide resolved
@gboduljak
Copy link
Contributor Author

@angeloskath, do you have any performance concerns regarding pooling implementation using as_strided ?

@gboduljak
Copy link
Contributor Author

gboduljak commented Jan 7, 2024

@angeloskath I implemented the requested changes. In addition, I separated the generic implementation into MaxPooling1d, MaxPooling2d, AvgPooling1d and AvgPooling2d and updated the docs accordingly. However, I am not sure about my inheritance hierarchy.

I would also appreciate if you could take a detailed look into some of the tests. The expected results were obtained by computing outputs of torch.nn pooling layer implementations and transposing the outputs to match our channels_last convention. There could be some mistakes.

Please take another look :)

@gboduljak gboduljak changed the title An initial implementation of pooling layers Pooling layers Jan 7, 2024
@gboduljak gboduljak requested a review from angeloskath January 7, 2024 19:07
docs/src/python/nn/layers.rst Outdated Show resolved Hide resolved
@gboduljak
Copy link
Contributor Author

@angeloskath Could you take a look at the changes addressing your comments?

@RahulBhalley
Copy link

RahulBhalley commented Feb 1, 2024

Hi guys! Any updates on this? Looks like Transformers get all the Attention (pun intended). 😅

Nearly no CNN can be implemented without pooling layers (even AlexNet, I'm trying to write VGG19 but pooling layers are needed). Can we prioritise this please???

P.S. I cannot use MLX at all until this gets merged. I am mostly into CNN.

CC: @angeloskath @awni @gboduljak

@gboduljak
Copy link
Contributor Author

@RahulBhalley I would like to complete this PR :) I am waiting for the review.

@RahulBhalley
Copy link

I highly appreciate your efforts @gboduljak. I hope this gets reviewed ASAP.

@angeloskath
Copy link
Member

@gboduljak I am coming back to this PR (sorry for taking too long). Could you perhaps put it back on top of main? Since rebase is pretty hard given the many merge commits I propose applying the diff on top of main and force pushing to the same branch. Something using the following

base=$(git merge-base main pooling-layers)
git checkout main
git checkout -b new-pooling-layers
git diff $base..pooling-layers >/tmp/pooling-layers.patch
git apply --reject /tmp/pooling-layers.patch
rm docs/src/python/nn/layers.rst.rej
git add <whatever is modified or uncommitted>
git commit -m 'Add pooling layers'
git branch -m pooling-layers old-pooling-layers
git branch -m pooling-layers
git push origin pooling-layers -f

I can do the above but then the authorship of the commit will be wrong (me instead of you).

@gboduljak
Copy link
Contributor Author

@angeloskath Thank you for informing me. I will attempt the rebase today.

@gboduljak
Copy link
Contributor Author

@angeloskath I did the rebase according to your instructions. Many thanks for the detailed instructions. Please take a look :)

@gboduljak gboduljak mentioned this pull request Feb 9, 2024
4 tasks
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the sliding window logic outside of the pooling layers and added a bunch of error reporting. Until we add bespoke kernels for pooling I think this is good enough to merge.

The PR now consists mostly of comments, tests and error reporting which is a good thing imho. I will wait for comments and then I 'll merge it.

@@ -8,8 +8,9 @@ Layers
.. autosummary::
:toctree: _autosummary
:template: nn-module-template.rst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this line needs to be here.

Comment on lines 142 to 143
\text{out}(N_i, k, C_j) = \max_{m=0, \ldots, k - 1}
\text{input}(N_i, \text{stride} \times k + m, C_j),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using k to represent both the kernel_size and index into the output, should be a separate variable for indexing into the output.

Could you change that here and in the other docstrings?

- a single ``int`` -- in which case the same value is used for both the
height and width axis;
- a ``tuple`` of two ``int`` s -- in which case, the first ``int`` is
used for the height axis, the second `int` for the width axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
used for the height axis, the second `int` for the width axis.
used for the height axis, the second ``int`` for the width axis.

\text{out}(N_i, h, w, C_j) = & \max_{m=0, \ldots, k_H-1} \max_{n=0, \ldots, k_W-1} \\
& \text{input}(N_i, \text{stride[0]} \times h + m,
\text{stride[1]} \times w + n, C_j),
\end{aligned}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a empty line after the math block otherwise the docs give an error, here and in the other doc strings

Assuming an input of shape :math:`(N, L, C)` and ``kernel_size`` is
:math:`k`, the output is a tensor of shape :math:`(N, L_{out}, C)`, given
by:
.. math::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line before math block here and other docstrings for which it is missing o/w docs complain.

where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`,
:math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`.

The parameters ``kernel_size``, ``stride``, ``padding``, can either be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line after this otherwise it shows up in bold.

where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`,
:math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`.

The parameters ``kernel_size``, ``stride``, ``padding``, can either be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line after this otherwise it shows up in bold.

@awni
Copy link
Member

awni commented Feb 13, 2024

Looks great! A few issues in the docs, please fix and then I think we can merge.

@awni
Copy link
Member

awni commented Feb 13, 2024

FYI some instructions on building the docs here. It's useful to build / look at them for big changes to the docs otherwise you can break them / make strange outputs.

@ligaz
Copy link

ligaz commented Feb 13, 2024

@awni It would be great if there is a preview environment where you can browse your changes to the docs. This will eliminate the need to this manually.

@awni
Copy link
Member

awni commented Feb 13, 2024

It would be great if there is a preview environment where you can browse your changes to the docs. This will eliminate the need to this manually.

Interesting idea.. maybe something we can setup with CircleCI 🤔

@awni
Copy link
Member

awni commented Feb 13, 2024

@gboduljak actually I just pushed the docs changes myself since I already made them locally.

LGTM, @angeloskath feel free to merge when you are good with it.

Thanks for the contribution!

@angeloskath angeloskath merged commit e54cbb7 into ml-explore:main Feb 13, 2024
@gboduljak
Copy link
Contributor Author

@angeloskath Thank you for the refactor.
@awni Thank you for fixing the docs nits. I was about to push my changes, but PR was already merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for pooling layers for CNNs, e.g. MaxPool2d
5 participants