Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiheadAttention 模块资料收集 #2

Closed
14 tasks
ccssu opened this issue Apr 4, 2023 · 0 comments
Closed
14 tasks

MultiheadAttention 模块资料收集 #2

ccssu opened this issue Apr 4, 2023 · 0 comments

Comments

@ccssu
Copy link
Collaborator

ccssu commented Apr 4, 2023

小结

AttributeError: module 'oneflow.nn' has no attribute 'MultiheadAttention'

目前 MultiheadAttention 模块开发需要的模块有点多,准备python端先绕过。

MultiheadAttention 介绍

MultiheadAttention is a PyTorch module that implements the multi-head attention mechanism used in transformer architectures¹. It takes in inputs of shape (batch_size, seq_len, hidden_dim) and returns an output tensor of shape (batch_size, seq_len, hidden_dim).

The multi-head attention mechanism is used to compute attention scores between different parts of the input sequence. It does this by computing multiple attention scores in parallel, each with its own set of parameters¹.

Let me know if you have any other questions!

源: 与必应的对话, 2023/4/4(1) MultiHeadAttention实现详解 - 知乎. https://zhuanlan.zhihu.com/p/358206572 访问时间 2023/4/4.
(2) MultiHeadAttention实现详解 | Finisky Garden. https://finisky.github.io/2020/05/25/multiheadattention/ 访问时间 2023/4/4.
(3) マルチヘッドアテンション (Multi-head Attention) [Transformerの部品]. https://cvml-expertguide.net/terms/dl/seq2seq-translation/transformer/multi-head-attention/ 访问时间 2023/4/4.
(4) MultiheadAttention — PyTorch 2.0 documentation. https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html 访问时间 2023/4/4.
(5) tf.keras.layers.MultiHeadAttention | TensorFlow v2.12.0. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention 访问时间 2023/4/4.
(6) MultiHeadAttention layer - Keras. https://keras.io/api/layers/attention_layers/multi_head_attention/ 访问时间 2023/4/4.

Pytorch

  • _native_multi_head_attention op (MultiheadAttention 模块中函数op )
  • scaled_dot_product_attention
  • _scaled_dot_product_attention
  • _scaled_dot_product_attention_math
  • _scaled_dot_product_flash_attention
  • _scaled_dot_product_flash_attention_backward
  • _scaled_dot_product_efficient_attention
  • _scaled_dot_product_efficient_attention_backward
  • _flash_attention_forward
  • _flash_attention_backward
  • _efficient_attention_forward
  • _efficient_attention_backward
  • multi_head_attention_forward
  • ....
_native_multi_head_attention

声明

# aten/src/ATen/native/native_functions.yaml

- func: _native_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None, bool need_weights=True, bool average_attn_weights=True, int? mask_type=None) -> (Tensor, Tensor)
  variants: function
  dispatch:
    CPU, NestedTensorCPU: native_multi_head_attention_cpu
    CUDA, NestedTensorCUDA: native_multi_head_attention_cuda
  autogen: _native_multi_head_attention.out
  • cpu 编码: aten/src/ATen/native/transformers/attention.cpp
  • cuda 编码: aten/src/ATen/native/transformers/cuda/attention.cu

Renferce

  • torch.nn.MultiheadAttention: link
  • has_torch_function: link
  • handle_torch_function: link
  • torch._C._nn.scaled_dot_product_attention link
@ccssu ccssu closed this as completed Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant