Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Lucas-rbnt · 2024-03-21T16:00:29Z

Description

When you choose to put the argument pos_embed_type='sincos' in the PatchEmbeddingBlock class, it still return a learnable positional encoding

To reproduce:

from monai.networks.blocks import PatchEmbeddingBlock
patcher = PatchEmbeddingBlock(
            in_channels=1,
            img_size=(32, 32, 32),
            patch_size=(8, 8, 8),
            hidden_size=96,
            num_heads=8,
            pos_embed_type="sincos",
            dropout_rate=0.5,
        )
print(patcher.position_embeddings.requires_grad) 
>>> True

In the literature, we sometimes use either positional encoding in sincos which are fixed and non-trainable as in the original Attention Is All You Need paper or a learnable positional embedding as in the ViT paper.
If you choose to use a sincos, then it seems that is must be fixed which is not the case here.
I'm not completely sure of the desired result in MONAI since there's already a learnable possibility, so if we choose sincos we'd like gradient-free parameters. However the documentation of build_sincos_position_embeddingin the pos_embed_utils.pyfiles stipulate: "The sin-cos position embedding as a learnable parameter" which seems a bit confusing. Especially as the encoding construction function seems to aim to set the require gradient to False (see below)

pos_embed = nn.Parameter(pos_emb)
pos_embed.requires_grad = False

return pos_embed

But these changes are not maintained by torch's copy_ function, which does not copy gradient parameters (see the cpp code https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51).
This copy_is used in the PatchEmbeddingBlock class to instantiate the positional embedding.

I propose a small fix to overcome this problem as well as test cases to ensure that positional embedding behaves correctly.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

…/blocks/patchembedding.py and associated tests Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>

monai/networks/blocks/patchembedding.py

marksgraham · 2024-03-22T14:18:33Z

LGTM, i agree the sincos embeddings should be fixed, it doesn't make sense to have them learnable.

KumoLiu · 2024-03-22T17:49:24Z

/build

KumoLiu · 2024-03-23T04:48:02Z

/build

KumoLiu · 2024-03-23T05:35:12Z

/build

KumoLiu · 2024-03-27T04:17:14Z

/build

KumoLiu · 2024-03-27T05:10:39Z

Merged now, feel free to create another PR for #7564 (comment).

…#7605) Fixes #7564 . ### Description As discussed, a small simplification for the creation of sincos positional encoding where we don't need to use the `torch.no_grad()` context or copy the tensor with `copy_` from torch which doesn't preserve the `requires_grad` attribute here. The changes are simple and are linked to the corresponding comment #7564, the output is already in float32 so it doesn't seem particularly necessary to apply the conversion previously done. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [x] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr> Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>

…s/patchembedding.py (Project-MONAI#7564) ### Description When you choose to put the argument `pos_embed_type='sincos'` in the `PatchEmbeddingBlock` class, it still return a learnable positional encoding To reproduce: ```python from monai.networks.blocks import PatchEmbeddingBlock patcher = PatchEmbeddingBlock( in_channels=1, img_size=(32, 32, 32), patch_size=(8, 8, 8), hidden_size=96, num_heads=8, pos_embed_type="sincos", dropout_rate=0.5, ) print(patcher.position_embeddings.requires_grad) >>> True ``` In the literature, we sometimes use either positional encoding in sincos which are fixed and non-trainable as in the original Attention Is All You Need [paper](https://arxiv.org/abs/1706.03762) or a learnable positional embedding as in the ViT [paper](https://arxiv.org/abs/2010.11929). If you choose to use a sincos, then it seems that is must be fixed which is not the case here. I'm not completely sure of the desired result in MONAI since there's already a learnable possibility, so if we choose sincos we'd like gradient-free parameters. However the documentation of `build_sincos_position_embedding`in the `pos_embed_utils.py`files stipulate: "The sin-cos position embedding as a learnable parameter" which seems a bit confusing. Especially as the encoding construction function seems to aim to set the require gradient to False (see below) ```python pos_embed = nn.Parameter(pos_emb) pos_embed.requires_grad = False return pos_embed ``` But these changes are not maintained by torch's `copy_` function, which does not copy gradient parameters (see the cpp code https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51). This `copy_`is used in the `PatchEmbeddingBlock` class to instantiate the positional embedding. I propose a small fix to overcome this problem as well as test cases to ensure that positional embedding behaves correctly. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr> Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com> Signed-off-by: Yu0610 <612410030@alum.ccu.edu.tw>

Fixing requires_grad for sincos positional encoding in monai/networks…

6b58da0

…/blocks/patchembedding.py and associated tests Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>

ericspod requested a review from marksgraham March 22, 2024 02:21

marksgraham approved these changes Mar 22, 2024

View reviewed changes

monai/networks/blocks/patchembedding.py Show resolved Hide resolved

Merge branch 'dev' into fix-sincos-encoding

25e6478

Merge branch 'dev' into fix-sincos-encoding

842c9fc

KumoLiu merged commit c9fed96 into Project-MONAI:dev Mar 27, 2024
28 checks passed

Lucas-rbnt mentioned this pull request Apr 2, 2024

simplification of the sincos positional encoding in patchembedding.py #7605

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Lucas-rbnt commented Mar 21, 2024 •

edited

Loading

marksgraham commented Mar 22, 2024

KumoLiu commented Mar 22, 2024

KumoLiu commented Mar 23, 2024

KumoLiu commented Mar 23, 2024

KumoLiu commented Mar 27, 2024

KumoLiu commented Mar 27, 2024

Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Conversation

Lucas-rbnt commented Mar 21, 2024 • edited Loading

Description

Types of changes

marksgraham commented Mar 22, 2024

KumoLiu commented Mar 22, 2024

KumoLiu commented Mar 23, 2024

KumoLiu commented Mar 23, 2024

KumoLiu commented Mar 27, 2024

KumoLiu commented Mar 27, 2024

Lucas-rbnt commented Mar 21, 2024 •

edited

Loading