Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564

Merged
merged 3 commits into from
Mar 27, 2024

Conversation

Lucas-rbnt
Copy link
Contributor

@Lucas-rbnt Lucas-rbnt commented Mar 21, 2024

Description

When you choose to put the argument pos_embed_type='sincos' in the PatchEmbeddingBlock class, it still return a learnable positional encoding

To reproduce:

from monai.networks.blocks import PatchEmbeddingBlock
patcher = PatchEmbeddingBlock(
            in_channels=1,
            img_size=(32, 32, 32),
            patch_size=(8, 8, 8),
            hidden_size=96,
            num_heads=8,
            pos_embed_type="sincos",
            dropout_rate=0.5,
        )
print(patcher.position_embeddings.requires_grad) 
>>> True

In the literature, we sometimes use either positional encoding in sincos which are fixed and non-trainable as in the original Attention Is All You Need paper or a learnable positional embedding as in the ViT paper.
If you choose to use a sincos, then it seems that is must be fixed which is not the case here.
I'm not completely sure of the desired result in MONAI since there's already a learnable possibility, so if we choose sincos we'd like gradient-free parameters. However the documentation of build_sincos_position_embeddingin the pos_embed_utils.pyfiles stipulate: "The sin-cos position embedding as a learnable parameter" which seems a bit confusing. Especially as the encoding construction function seems to aim to set the require gradient to False (see below)

pos_embed = nn.Parameter(pos_emb)
pos_embed.requires_grad = False

return pos_embed

But these changes are not maintained by torch's copy_ function, which does not copy gradient parameters (see the cpp code https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51).
This copy_is used in the PatchEmbeddingBlock class to instantiate the positional embedding.

I propose a small fix to overcome this problem as well as test cases to ensure that positional embedding behaves correctly.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

…/blocks/patchembedding.py and associated tests

Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>
@marksgraham
Copy link
Contributor

LGTM, i agree the sincos embeddings should be fixed, it doesn't make sense to have them learnable.

@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 22, 2024

/build

2 similar comments
@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 23, 2024

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 23, 2024

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 27, 2024

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 27, 2024

Merged now, feel free to create another PR for #7564 (comment).

@KumoLiu KumoLiu merged commit c9fed96 into Project-MONAI:dev Mar 27, 2024
28 checks passed
KumoLiu added a commit that referenced this pull request Apr 4, 2024
…#7605)

Fixes #7564 .

### Description

As discussed, a small simplification for the creation of sincos
positional encoding where we don't need to use the `torch.no_grad()`
context or copy the tensor with `copy_` from torch which doesn't
preserve the `requires_grad` attribute here.

The changes are simple and are linked to the corresponding comment
#7564, the output is already in float32 so it doesn't seem particularly
necessary to apply the conversion previously done.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [x] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [x] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Yu0610 pushed a commit to Yu0610/MONAI that referenced this pull request Apr 11, 2024
…s/patchembedding.py (Project-MONAI#7564)

### Description

When you choose to put the argument `pos_embed_type='sincos'` in the
`PatchEmbeddingBlock` class, it still return a learnable positional
encoding

To reproduce:
```python
from monai.networks.blocks import PatchEmbeddingBlock
patcher = PatchEmbeddingBlock(
            in_channels=1,
            img_size=(32, 32, 32),
            patch_size=(8, 8, 8),
            hidden_size=96,
            num_heads=8,
            pos_embed_type="sincos",
            dropout_rate=0.5,
        )
print(patcher.position_embeddings.requires_grad)
>>> True
```

In the literature, we sometimes use either positional encoding in sincos
which are fixed and non-trainable as in the original Attention Is All
You Need [paper](https://arxiv.org/abs/1706.03762) or a learnable
positional embedding as in the ViT
[paper](https://arxiv.org/abs/2010.11929).
If you choose to use a sincos, then it seems that is must be fixed which
is not the case here.
I'm not completely sure of the desired result in MONAI since there's
already a learnable possibility, so if we choose sincos we'd like
gradient-free parameters. However the documentation of
`build_sincos_position_embedding`in the `pos_embed_utils.py`files
stipulate: "The sin-cos position embedding as a learnable parameter"
which seems a bit confusing. Especially as the encoding construction
function seems to aim to set the require gradient to False (see below)

```python
pos_embed = nn.Parameter(pos_emb)
pos_embed.requires_grad = False

return pos_embed
```
But these changes are not maintained by torch's `copy_` function, which
does not copy gradient parameters (see the cpp code
https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51).
This `copy_`is used in the `PatchEmbeddingBlock` class to instantiate
the positional embedding.

I propose a small fix to overcome this problem as well as test cases to
ensure that positional embedding behaves correctly.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [x] New tests added to cover the changes.
- [x] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [x] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: Yu0610 <612410030@alum.ccu.edu.tw>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants