Add dimensionality of heads argument to SABlock #7664

NabJa · 2024-04-18T13:34:16Z

Description

The changes made add a parameter (dim_head) to set the output paramters of all the heads in the Self-attention Block (SABlock). Currently the output dimension is set to be hidden_size and when increasing the number of heads this is equally distributed among all heads.

Example

The original implementation automatically determines equally_distributed_head_dim:
(qkv * num_heds * equally_distributed_head_dim = 3*hidden_size
in this example -> 3 * 8 * 16 = 384)

block = SABlock(hidden_size=128, num_heads=8)
x = torch.zeros(1, 256, 128)
x = block.qkv(x)
print(x.shape)
x = block.input_rearrange(x)
print(x.shape)

> torch.Size([1, 256, 384])
> torch.Size([3, 1, 8, 256, 16]) # <- This corresponds to (qkv batch num_heads sequence_length equally_distributed_head_dim)

The propesed implementation fixes this by setting the new argument dim_head:

block_new = SABlock(hidden_size=128, num_heads=8, dim_head=32)
x = torch.zeros(1, 256, 128)
x = block_new.qkv(x)
print(x.shape)
x = block_new.input_rearrange(x)
print(x.shape)

> torch.Size([1, 256, 384])
> torch.Size([3, 1, 8, 256, 32]) # <- This corresponds to (qkv batch num_heads sequence_length dim_head)

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

ericspod · 2024-04-23T11:21:29Z

Hi @NabJa thanks for the contribution. You'll have to fix some of the issues that you're seeing but some others may be related to our testing system. Please follow the DCO instructions for doing a remediation commit, and please run ./runtests.sh --autofix to fix other issues. @marksgraham is there anything here that would affect merging the generative code?

KumoLiu · 2024-04-24T06:27:32Z

Hi @NabJa, I guess it's expected behavior.

Increasing the num_heads in the self-attention block (SABlock) does not increase the number of trainable parameters.
The original input embeddings are divided into smaller chunks across a specified number of attention heads in a multi-head attention mechanism. Each of these heads then independently performs an attention mechanism on their allocated chunk of the data.
Refer: https://github.com/pytorch/pytorch/blob/81740fd1f6fcd70c6ba4812c1289fe7efcc82908/torch/nn/modules/activation.py#L1010
https://discuss.huggingface.co/t/what-does-increasing-number-of-heads-do-in-the-multi-head-attention/1847
https://www.mathworks.com/matlabcentral/answers/2068031-attention-layer-number-of-parameters-doesn-t-change-when-changing-number-of-heads

But @ericspod and @marksgraham might have more expertise on this. What are your thoughts?
If we indeed want to make this change, perhaps we need to include the original implementation.
Thanks.

NabJa · 2024-04-24T09:20:07Z

Hi @KumoLiu ,

thank you for the references! Indeed, the official PyTorch implementation splits the embeddings across all heads resulting in a head dimension of embedding dimension // number heads.
However, I still think having the option of manually setting this makes a lot of sense, because I might want to be able to increase the number of heads without loosing representational power used for every attention map. Other frequently used Attention implementations also manually set the head dimension (lucidrains).
I will make another commit changing it in a way that the default behaviour stays as it is, additionally having the option of manually setting the dimension head.
Looking forward to your opinions / reviews.

marksgraham · 2024-04-24T10:23:53Z

We need to be careful not to break backwards compatibility here so the default behaviour stays the same. If that happens then I don't see anything that would affect the generative code merge

Signed-off-by: NabJa <nabil.jabareen@gmail.com> DCO Remediation Commit for NabJa <nabil.jabareen@gmail.com> I, NabJa <nabil.jabareen@gmail.com>, hereby add my Signed-off-by to this commit: 139182e Signed-off-by: NabJa <nabil.jabareen@gmail.com>

…d-sablock-head-dim

NabJa · 2024-04-24T14:37:49Z

@marksgraham complete backward compatibility should be guaranteed with 1ccb5de .
@ericspod DCO is updated and linting passes the checks.

monai/networks/blocks/selfattention.py

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

marksgraham · 2024-04-25T12:46:51Z

Would it be possible to hold off merging this for a couple of days? I'm working on some changes for self-attention/cross-attention for the MONAI Generative merge and it would be good to get a little bit further into them so I can work out if i need to make any further changes to accommodate this PR.

ericspod · 2024-04-25T13:15:40Z

Would it be possible to hold off merging this for a couple of days? I'm working on some changes for self-attention/cross-attention for the MONAI Generative merge and it would be good to get a little bit further into them so I can work out if i need to make any further changes to accommodate this PR.

I think we're good to delay until you're set, thanks.

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

…d-sablock-head-dim

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

KumoLiu · 2024-05-08T13:27:45Z

/build

Add dimensionality of heads argument to SABlock

139182e

ericspod requested review from marksgraham, Nic-Ma and KumoLiu April 23, 2024 11:21

Merge branch 'dev' into add-sablock-head-dim

36c0c60

NabJa and others added 2 commits April 24, 2024 11:24

Merge branch 'Project-MONAI:dev' into add-sablock-head-dim

86a5b5e

Merge branch 'dev' of github.com:NabJa/MONAI into add-sablock-head-dim

f1fd5c8

NabJa added 2 commits April 24, 2024 14:07

Merge branch 'add-sablock-head-dim' of github.com:NabJa/MONAI into ad…

9758317

…d-sablock-head-dim

KumoLiu reviewed Apr 24, 2024

View reviewed changes

monai/networks/blocks/selfattention.py Show resolved Hide resolved

NabJa and others added 2 commits April 24, 2024 17:47

Compute scale based on updated dim_head

71a55b3

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

Merge branch 'dev' into add-sablock-head-dim

05f86a9

NabJa and others added 4 commits April 26, 2024 10:40

Add tests for SABlock number of parameters

55d0cdc

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

Merge branch 'add-sablock-head-dim' of github.com:NabJa/MONAI into ad…

3d0b7fc

…d-sablock-head-dim

Merge branch 'dev' into add-sablock-head-dim

fad3ea1

Refactor test_selfattention for better readability

36fa810

Signed-off-by: NabJa <nabil.jabareen@gmail.com>

KumoLiu mentioned this pull request May 6, 2024

7227 refactor transformer and diffusion model unet #7715

Merged

7 tasks

Merge branch 'dev' into add-sablock-head-dim

9041992

KumoLiu approved these changes May 8, 2024

View reviewed changes

KumoLiu merged commit d83fa56 into Project-MONAI:dev May 8, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dimensionality of heads argument to SABlock #7664

Add dimensionality of heads argument to SABlock #7664

NabJa commented Apr 18, 2024 •

edited

Loading

ericspod commented Apr 23, 2024

KumoLiu commented Apr 24, 2024

NabJa commented Apr 24, 2024

marksgraham commented Apr 24, 2024

NabJa commented Apr 24, 2024

marksgraham commented Apr 25, 2024

ericspod commented Apr 25, 2024

KumoLiu commented May 8, 2024

Add dimensionality of heads argument to SABlock #7664

Add dimensionality of heads argument to SABlock #7664

Conversation

NabJa commented Apr 18, 2024 • edited Loading

Description

Example

Types of changes

ericspod commented Apr 23, 2024

KumoLiu commented Apr 24, 2024

NabJa commented Apr 24, 2024

marksgraham commented Apr 24, 2024

NabJa commented Apr 24, 2024

marksgraham commented Apr 25, 2024

ericspod commented Apr 25, 2024

KumoLiu commented May 8, 2024

NabJa commented Apr 18, 2024 •

edited

Loading