Pooling layers #357

gboduljak · 2024-01-04T01:39:56Z

Proposed changes

Added MaxPool and AvgPool layers. MaxPool and AvgPool layers were requested in this issue. In this PR, I propose implementing the requested pooling operations by firstly computing sliding windows and subsequently reducing them. More precisely, we can use as_strided to compute pooling sliding windows. Then, we can simply reduce over appropriate axes to implement the desired pooling operation.

Concerns: My only concern is performance. Ideally, we should call optimized backend kernels for pooling operations. There are MPSCNNPoolingAverageNode and MPSCNNPoolingMaxNode. Similarly, there is BNNSFilterCreateLayerPooling in Accelerate. Alternatively, we could implement a kernel for window reduction.

Closes #25.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

gboduljak · 2024-01-04T01:41:20Z

@awni @angeloskath Could you take a look at this draft implementation?

angeloskath

Thanks this looks great! I left a couple of comments regarding using strides in MLX arrays, basically they shouldn't be used unless implementing a Primitive.

As an additional comment, I think it is nice that the implementation is general but in this case it hinders readability quite a lot. I would rather see a simple Pooling1d or even MaxPooling1d and then refactor things out to a base class when it makes sense rather than directly a very general implementation and then possibly aliases.

One example where the generality is hurting readability is that it pushes all argument normalization into the logic because it depends on len(feature_sizes). Or it combines all the docs into a big one that is harder to read.

python/src/array.cpp

angeloskath · 2024-01-07T11:07:20Z

python/mlx/nn/layers/pooling.py

+            + feature_strides
+            + [channels_stride]
+        )
+        windows = mx.as_strided(a, windows_shape, windows_strides)


In MLX's as_strided the provided shape, strides and offset should be provided as if the original array is row contiguous. This is also why we don't need access to the original strides.

Assuming we cannot expose array.strides(), how would you access them to computewindows_strides?
Given the convention that strides should be provided as if the original array is row contiguous, we could manually compute strides based on the shape (e.g. reverse cumulative product of the shape). Is there a better way?

python/mlx/nn/layers/pooling.py

gboduljak · 2024-01-07T12:02:30Z

@angeloskath, do you have any performance concerns regarding pooling implementation using as_strided ?

python/mlx/nn/layers/pooling.py

gboduljak · 2024-01-07T18:47:15Z

@angeloskath I implemented the requested changes. In addition, I separated the generic implementation into MaxPooling1d, MaxPooling2d, AvgPooling1d and AvgPooling2d and updated the docs accordingly. However, I am not sure about my inheritance hierarchy.

I would also appreciate if you could take a detailed look into some of the tests. The expected results were obtained by computing outputs of torch.nn pooling layer implementations and transposing the outputs to match our channels_last convention. There could be some mistakes.

Please take another look :)

docs/src/python/nn/layers.rst

gboduljak · 2024-01-11T12:29:45Z

@angeloskath Could you take a look at the changes addressing your comments?

RahulBhalley · 2024-02-01T14:17:01Z

Hi guys! Any updates on this? Looks like Transformers get all the Attention (pun intended). 😅

Nearly no CNN can be implemented without pooling layers (even AlexNet, I'm trying to write VGG19 but pooling layers are needed). Can we prioritise this please???

P.S. I cannot use MLX at all until this gets merged. I am mostly into CNN.

CC: @angeloskath @awni @gboduljak

gboduljak · 2024-02-01T18:31:34Z

@RahulBhalley I would like to complete this PR :) I am waiting for the review.

RahulBhalley · 2024-02-02T01:14:07Z

I highly appreciate your efforts @gboduljak. I hope this gets reviewed ASAP.

angeloskath · 2024-02-08T00:41:00Z

@gboduljak I am coming back to this PR (sorry for taking too long). Could you perhaps put it back on top of main? Since rebase is pretty hard given the many merge commits I propose applying the diff on top of main and force pushing to the same branch. Something using the following

base=$(git merge-base main pooling-layers)
git checkout main
git checkout -b new-pooling-layers
git diff $base..pooling-layers >/tmp/pooling-layers.patch
git apply --reject /tmp/pooling-layers.patch
rm docs/src/python/nn/layers.rst.rej
git add <whatever is modified or uncommitted>
git commit -m 'Add pooling layers'
git branch -m pooling-layers old-pooling-layers
git branch -m pooling-layers
git push origin pooling-layers -f

I can do the above but then the authorship of the commit will be wrong (me instead of you).

gboduljak · 2024-02-08T12:16:12Z

@angeloskath Thank you for informing me. I will attempt the rebase today.

gboduljak · 2024-02-09T00:19:18Z

@angeloskath I did the rebase according to your instructions. Many thanks for the detailed instructions. Please take a look :)

angeloskath

I refactored the sliding window logic outside of the pooling layers and added a bunch of error reporting. Until we add bespoke kernels for pooling I think this is good enough to merge.

The PR now consists mostly of comments, tests and error reporting which is a good thing imho. I will wait for comments and then I 'll merge it.

awni · 2024-02-13T03:14:13Z

docs/src/python/nn/layers.rst

@@ -8,8 +8,9 @@ Layers
 .. autosummary::
   :toctree: _autosummary
   :template: nn-module-template.rst
-


FYI this line needs to be here.

awni · 2024-02-13T03:18:41Z

python/mlx/nn/layers/pooling.py

+            \text{out}(N_i, k, C_j) = \max_{m=0, \ldots, k - 1}
+                    \text{input}(N_i, \text{stride} \times k + m, C_j),


You are using k to represent both the kernel_size and index into the output, should be a separate variable for indexing into the output.

Could you change that here and in the other docstrings?

awni · 2024-02-13T03:21:10Z

python/mlx/nn/layers/pooling.py

+        - a single ``int`` -- in which case the same value is used for both the
+          height and width axis;
+        - a ``tuple`` of two ``int`` s -- in which case, the first ``int`` is
+          used for the height axis, the second `int` for the width axis.


Suggested change

used for the height axis, the second `int` for the width axis.

used for the height axis, the second ``int`` for the width axis.

awni · 2024-02-13T03:25:10Z

python/mlx/nn/layers/pooling.py

+            \text{out}(N_i, h, w, C_j) = & \max_{m=0, \ldots, k_H-1} \max_{n=0, \ldots, k_W-1} \\
+                                    & \text{input}(N_i, \text{stride[0]} \times h + m,
+                                                \text{stride[1]} \times w + n, C_j),
+        \end{aligned}


add a empty line after the math block otherwise the docs give an error, here and in the other doc strings

awni · 2024-02-13T03:27:33Z

python/mlx/nn/layers/pooling.py

+    Assuming an input of shape :math:`(N, L, C)` and ``kernel_size`` is
+    :math:`k`, the output is a tensor of shape :math:`(N, L_{out}, C)`, given
+    by:
+        .. math::


New line before math block here and other docstrings for which it is missing o/w docs complain.

awni · 2024-02-13T03:28:39Z

python/mlx/nn/layers/pooling.py

+    where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`,
+    :math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`.
+
+    The parameters ``kernel_size``, ``stride``, ``padding``, can either be:


New line after this otherwise it shows up in bold.

awni · 2024-02-13T03:28:50Z

python/mlx/nn/layers/pooling.py

+    where :math:`H_{out} = \left\lfloor\frac{H + 2 * \text{padding[0]} - \text{kernel_size[0]}}{\text{stride[0]}}\right\rfloor + 1`,
+    :math:`W_{out} = \left\lfloor\frac{W + 2 * \text{padding[1]} - \text{kernel_size[1]}}{\text{stride[1]}}\right\rfloor + 1`.
+
+    The parameters ``kernel_size``, ``stride``, ``padding``, can either be:


New line after this otherwise it shows up in bold.

awni · 2024-02-13T03:29:27Z

Looks great! A few issues in the docs, please fix and then I think we can merge.

awni · 2024-02-13T03:30:17Z

FYI some instructions on building the docs here. It's useful to build / look at them for big changes to the docs otherwise you can break them / make strange outputs.

ligaz · 2024-02-13T03:33:07Z

@awni It would be great if there is a preview environment where you can browse your changes to the docs. This will eliminate the need to this manually.

awni · 2024-02-13T03:35:29Z

It would be great if there is a preview environment where you can browse your changes to the docs. This will eliminate the need to this manually.

Interesting idea.. maybe something we can setup with CircleCI 🤔

awni · 2024-02-13T03:37:59Z

@gboduljak actually I just pushed the docs changes myself since I already made them locally.

LGTM, @angeloskath feel free to merge when you are good with it.

Thanks for the contribution!

gboduljak · 2024-02-13T07:25:49Z

@angeloskath Thank you for the refactor.
@awni Thank you for fixing the docs nits. I was about to push my changes, but PR was already merged :)

gboduljak force-pushed the pooling-layers branch 2 times, most recently from d27260f to a6a2012 Compare January 6, 2024 18:38

gboduljak marked this pull request as ready for review January 6, 2024 19:14

gboduljak changed the title ~~A draft implementation of pooling layers~~ An initial implementation of pooling layers Jan 6, 2024

gboduljak force-pushed the pooling-layers branch from dbff94a to 16eef36 Compare January 6, 2024 19:37

angeloskath requested changes Jan 7, 2024

View reviewed changes

ligaz reviewed Jan 7, 2024

View reviewed changes

python/mlx/nn/layers/pooling.py Outdated Show resolved Hide resolved

gboduljak changed the title ~~An initial implementation of pooling layers~~ Pooling layers Jan 7, 2024

gboduljak requested a review from angeloskath January 7, 2024 19:07

NeptuneIsTheBest mentioned this pull request Jan 8, 2024

[Feature] Adaptive pooling layers #400

Open

ligaz reviewed Jan 8, 2024

View reviewed changes

docs/src/python/nn/layers.rst Outdated Show resolved Hide resolved

Add pooling layers

648cc3b

gboduljak force-pushed the pooling-layers branch from ca708af to 648cc3b Compare February 9, 2024 00:16

corrected order of layers in layers.rst

3298d66

gboduljak mentioned this pull request Feb 9, 2024

Upsample2d #414

Merged

4 tasks

Refactor the pooling layers

8250c3c

angeloskath approved these changes Feb 12, 2024

View reviewed changes

awni reviewed Feb 13, 2024

View reviewed changes

fix some docs issues

1babda4

docs nit

3e9ad12

angeloskath merged commit e54cbb7 into ml-explore:main Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pooling layers #357

Pooling layers #357

gboduljak commented Jan 4, 2024 •

edited

Loading

gboduljak commented Jan 4, 2024

angeloskath left a comment

angeloskath Jan 7, 2024

gboduljak Jan 7, 2024 •

edited

Loading

gboduljak commented Jan 7, 2024

gboduljak commented Jan 7, 2024 •

edited

Loading

gboduljak commented Jan 11, 2024

RahulBhalley commented Feb 1, 2024 •

edited

Loading

gboduljak commented Feb 1, 2024

RahulBhalley commented Feb 2, 2024

angeloskath commented Feb 8, 2024

gboduljak commented Feb 8, 2024

gboduljak commented Feb 9, 2024

angeloskath left a comment

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni commented Feb 13, 2024

awni commented Feb 13, 2024

ligaz commented Feb 13, 2024

awni commented Feb 13, 2024

awni commented Feb 13, 2024

gboduljak commented Feb 13, 2024

		\text{out}(N_i, k, C_j) = \max_{m=0, \ldots, k - 1}
		\text{input}(N_i, \text{stride} \times k + m, C_j),

	used for the height axis, the second `int` for the width axis.
	used for the height axis, the second ``int`` for the width axis.

Pooling layers #357

Pooling layers #357

Conversation

gboduljak commented Jan 4, 2024 • edited Loading

Proposed changes

Checklist

gboduljak commented Jan 4, 2024

angeloskath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gboduljak Jan 7, 2024 • edited Loading

Choose a reason for hiding this comment

gboduljak commented Jan 7, 2024

gboduljak commented Jan 7, 2024 • edited Loading

gboduljak commented Jan 11, 2024

RahulBhalley commented Feb 1, 2024 • edited Loading

gboduljak commented Feb 1, 2024

RahulBhalley commented Feb 2, 2024

angeloskath commented Feb 8, 2024

gboduljak commented Feb 8, 2024

gboduljak commented Feb 9, 2024

angeloskath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awni commented Feb 13, 2024

awni commented Feb 13, 2024

ligaz commented Feb 13, 2024

awni commented Feb 13, 2024

awni commented Feb 13, 2024

gboduljak commented Feb 13, 2024

gboduljak commented Jan 4, 2024 •

edited

Loading

gboduljak Jan 7, 2024 •

edited

Loading

gboduljak commented Jan 7, 2024 •

edited

Loading

RahulBhalley commented Feb 1, 2024 •

edited

Loading