-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pooling layers #357
Pooling layers #357
Conversation
@awni @angeloskath Could you take a look at this draft implementation? |
d27260f
to
a6a2012
Compare
dbff94a
to
16eef36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks great! I left a couple of comments regarding using strides
in MLX arrays, basically they shouldn't be used unless implementing a Primitive
.
As an additional comment, I think it is nice that the implementation is general but in this case it hinders readability quite a lot. I would rather see a simple Pooling1d
or even MaxPooling1d
and then refactor things out to a base class when it makes sense rather than directly a very general implementation and then possibly aliases.
One example where the generality is hurting readability is that it pushes all argument normalization into the logic because it depends on len(feature_sizes)
. Or it combines all the docs into a big one that is harder to read.
python/mlx/nn/layers/pooling.py
Outdated
+ feature_strides | ||
+ [channels_stride] | ||
) | ||
windows = mx.as_strided(a, windows_shape, windows_strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MLX's as_strided
the provided shape, strides and offset should be provided as if the original array is row contiguous. This is also why we don't need access to the original strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we cannot expose array.strides()
, how would you access them to computewindows_strides
?
Given the convention that strides
should be provided as if the original array is row contiguous, we could manually compute strides based on the shape (e.g. reverse cumulative product of the shape). Is there a better way?
@angeloskath, do you have any performance concerns regarding pooling implementation using |
@angeloskath I implemented the requested changes. In addition, I separated the generic implementation into I would also appreciate if you could take a detailed look into some of the tests. The expected results were obtained by computing outputs of Please take another look :) |
@angeloskath Could you take a look at the changes addressing your comments? |
Hi guys! Any updates on this? Looks like Transformers get all the Attention (pun intended). 😅 Nearly no CNN can be implemented without pooling layers (even AlexNet, I'm trying to write VGG19 but pooling layers are needed). Can we prioritise this please??? P.S. I cannot use MLX at all until this gets merged. I am mostly into CNN. |
@RahulBhalley I would like to complete this PR :) I am waiting for the review. |
I highly appreciate your efforts @gboduljak. I hope this gets reviewed ASAP. |
@gboduljak I am coming back to this PR (sorry for taking too long). Could you perhaps put it back on top of main? Since rebase is pretty hard given the many merge commits I propose applying the diff on top of main and force pushing to the same branch. Something using the following base=$(git merge-base main pooling-layers)
git checkout main
git checkout -b new-pooling-layers
git diff $base..pooling-layers >/tmp/pooling-layers.patch
git apply --reject /tmp/pooling-layers.patch
rm docs/src/python/nn/layers.rst.rej
git add <whatever is modified or uncommitted>
git commit -m 'Add pooling layers'
git branch -m pooling-layers old-pooling-layers
git branch -m pooling-layers
git push origin pooling-layers -f I can do the above but then the authorship of the commit will be wrong (me instead of you). |
@angeloskath Thank you for informing me. I will attempt the rebase today. |
ca708af
to
648cc3b
Compare
@angeloskath I did the rebase according to your instructions. Many thanks for the detailed instructions. Please take a look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the sliding window logic outside of the pooling layers and added a bunch of error reporting. Until we add bespoke kernels for pooling I think this is good enough to merge.
The PR now consists mostly of comments, tests and error reporting which is a good thing imho. I will wait for comments and then I 'll merge it.
@@ -8,8 +8,9 @@ Layers | |||
.. autosummary:: | |||
:toctree: _autosummary | |||
:template: nn-module-template.rst | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this line needs to be here.
python/mlx/nn/layers/pooling.py
Outdated
\text{out}(N_i, k, C_j) = \max_{m=0, \ldots, k - 1} | ||
\text{input}(N_i, \text{stride} \times k + m, C_j), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using k
to represent both the kernel_size
and index into the output, should be a separate variable for indexing into the output.
Could you change that here and in the other docstrings?
python/mlx/nn/layers/pooling.py
Outdated
- a single ``int`` -- in which case the same value is used for both the | ||
height and width axis; | ||
- a ``tuple`` of two ``int`` s -- in which case, the first ``int`` is | ||
used for the height axis, the second `int` for the width axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used for the height axis, the second `int` for the width axis. | |
used for the height axis, the second ``int`` for the width axis. |
\text{out}(N_i, h, w, C_j) = & \max_{m=0, \ldots, k_H-1} \max_{n=0, \ldots, k_W-1} \\ | ||
& \text{input}(N_i, \text{stride[0]} \times h + m, | ||
\text{stride[1]} \times w + n, C_j), | ||
\end{aligned} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a empty line after the math block otherwise the docs give an error, here and in the other doc strings
Assuming an input of shape :math:`(N, L, C)` and ``kernel_size`` is | ||
:math:`k`, the output is a tensor of shape :math:`(N, L_{out}, C)`, given | ||
by: | ||
.. math:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line before math block here and other docstrings for which it is missing o/w docs complain.
where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`, | ||
:math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`. | ||
|
||
The parameters ``kernel_size``, ``stride``, ``padding``, can either be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line after this otherwise it shows up in bold.
where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`, | ||
:math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`. | ||
|
||
The parameters ``kernel_size``, ``stride``, ``padding``, can either be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line after this otherwise it shows up in bold.
Looks great! A few issues in the docs, please fix and then I think we can merge. |
FYI some instructions on building the docs here. It's useful to build / look at them for big changes to the docs otherwise you can break them / make strange outputs. |
@awni It would be great if there is a preview environment where you can browse your changes to the docs. This will eliminate the need to this manually. |
Interesting idea.. maybe something we can setup with CircleCI 🤔 |
@gboduljak actually I just pushed the docs changes myself since I already made them locally. LGTM, @angeloskath feel free to merge when you are good with it. Thanks for the contribution! |
@angeloskath Thank you for the refactor. |
Proposed changes
Added
MaxPool
andAvgPool
layers. MaxPool and AvgPool layers were requested in this issue. In this PR, I propose implementing the requested pooling operations by firstly computing sliding windows and subsequently reducing them. More precisely, we can useas_strided
to compute pooling sliding windows. Then, we can simply reduce over appropriate axes to implement the desired pooling operation.Concerns: My only concern is performance. Ideally, we should call optimized backend kernels for pooling operations. There are
MPSCNNPoolingAverageNode
andMPSCNNPoolingMaxNode
. Similarly, there isBNNSFilterCreateLayerPooling
in Accelerate. Alternatively, we could implement a kernel for window reduction.Closes #25.
Checklist
Put an
x
in the boxes that apply.pre-commit run --all-files
to format my code / installed pre-commit prior to committing changes