This document contains models and steps to reproduce the results of MATE: Multi-view Attention for Table Transformer Efficiency published at EMNLP 2021.
Based on the intuition that attention across tokens in different columns and rows is not needed, MATE uses two types of attention heads that can either only attend within the same column or within the same row.
MATE can be (approximately) implemented linearly by adapting an idea from Reformer (Kitaev et al., 2020): having column heads sort the input according to a column order and row heads according to the row order. Then the input is bucketed and attention restricted to adjacent buckets.
Using for pre-training or fine-tuning a model can be accomplished through the
following configuration flags in tapas_classifier_experiment.py
:
--restrict_attention_mode=same_colum_or_row
Attention from tokens in different columns and rows is masked out.--restrict_attention_mode=headwise_same_colum_or_row
Row heads mask attention between different rows, and columns heads between columns. Thebucket_size
andheader_size
arguments define below can be optionally applied to mimic the efficient implementation.--restrict_attention_mode=headwise_efficient
Similar toheadwise_same_colum_or_row
but uses an log linear implementation by sorting the input tokens by column or row order depending on the type of attention head.--restrict_attention_bucket_size=<int>
For sparse attention modes, further restricts attention to consecutive buckets of uniform size. Two tokens may only attend each other if the fall in consecutive buckets of this size. Only required forrestrict_attention_mode=headwise_efficient
.--restrict_attention_header_size=<int>
For sparse attention modes, size of the first section that will attend to/from everything else. Only required forrestrict_attention_mode=headwise_efficient
.- `--restrict_attention_row_heads_ratio= For sparse attention modes, proportion of heads that should focus on rows vs columns. Default is 0.5.
This code and data are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
See also the Wikipedia Copyrights page.
You can cite the paper to appear in EMNLP 2021.