-
Notifications
You must be signed in to change notification settings - Fork 538
sliding window self-attention cell #1395
base: master
Are you sure you want to change the base?
Conversation
Waiting for apache/mxnet#19387 to be merged. |
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1395/sw_atten_cell/index.html |
Is it possible for us to revise the interface to be similar to https://www.deepspeed.ai/tutorials/sparse-attention/? |
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1395/sw_atten_cell/index.html |
benchmark script
|
Is there any update on this PR? |
@sxjscience it seems the error |
Yes, we can merge the master so that we will retrigger the test. |
Do we have update on this? @ZiyueHuang would you have time to rebase the code? |
Description
The AttentionCell for the sliding window self-attention, including the support for multi-headed dilation and the causal attention mode, described in Longformer: The Long-Document Transformer.
cc @sxjscience @szhengac
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team