本reading-list是华东师范大学智能知识管理团队(负责人:王晓玲教授)在2022年下半年讨论班的论文阅读目录.
本讨论班围绕transformer模型这个目前大火的模型,分享学习其架构设计,预训练和微调方面的最新研究进展。这个reading-list还在不断地完善中,我们会持续更新这个reading-list. 我们非常欢迎Pull Request ! 如果有好的建议,欢迎联系 wzhu@stu.ecnu.edu.cn.
- Lin, Tianyang, Yuxin Wang, Xiangyang Liu and Xipeng Qiu. A Survey of Transformers.
- Qiu, Xipeng, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai and Xuanjing Huang. Pre-trained Models for Natural Language Processing: A Survey.
- Khan, Salman Hameed, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan and Mubarak Shah. “Transformers in Vision: A Survey.” ACM Computing Surveys (CSUR) (2022)
- Yu, Junliang, Hongzhi Yin, Xin Xia, Tong Chen, Jundong Li and Zi-Liang Huang. “Self-Supervised Learning for Recommender Systems: A Survey.”
- Han, Xu, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-rong Wen, Jinhui Yuan, Wayne Xin Zhao and Jun Zhu. “Pre-Trained Models: Past, Present and Future.” AI Open 2 (2021): 225-250.
- Du, Yifan, Zikang Liu, Junyi Li and Wayne Xin Zhao. “A Survey of Vision-Language Pre-Trained Models.” IJCAI (2022).
- Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi and Graham Neubig. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.”
- Transformers in Time Series: A Survey
- Vision-Language Pretraining: Current Trends and the Future. ACL-2022.
- Tutorial on MultiModal Machine Learning. CVPR-2022.
- Beyond Convolutional Neural Networks. CVPR-2022.
- Denoising Diffusion-based Generative Modeling: Foundations and Applications. CVPR-2022.
- Pre-training Methods for Neural Machine Translation. ACL-2021.
- Contrastive Data and Learning for Natural Language Processing. ACL-2022
- Robust Time Series Analysis and Applications: An Industrial Perspective. KDD-2022
- Time Series in Healthcare: Challenges and Solutions. AAAI-2022
- What Dense Graph Do You Need for Self-Attention
- Flowformer: Linearizing Transformers with Conservation Flows
- cosFormer: Rethinking Softmax in Attention
- Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
- Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
- Directed Acyclic Transformer for Non-Autoregressive Machine Translation (ICML2022)
- MPViT: Multi-Path Vision Transformer for Dense Prediction
- Mobile-Former: Bridging MobileNet and Transformer
- MetaFormer is Actually What You Need for Vision
- Shunted Self-Attention via Multi-Scale Token Aggregation
- Wu, Zhanghao, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph Gonzalez and Ioan Cristian Stoica. “Representing Long-Range Context for Graph Neural Networks with Global Attention.” NeurIPS (2021).
- Hussain, Md Shamim, Mohammed J. Zaki and D. Subramanian. “Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs.”
- Zhao, Jianan, Chaozhuo Li, Qian Wen, Yiqi Wang, Yuming Liu, Hao Sun, Xing Xie and Yanfang Ye. “Gophormer: Ego-Graph Transformer for Node Classification.”
- Rethinking Graph Transformers with Spectral Attention
- He, Pengcheng, Jianfeng Gao and Weizhu Chen. “DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing.”
- mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
- Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei. DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders.
- Qin, Yujia, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun and Jie Zhou. “ELLE: Efficient Lifelong Pre-training for Emerging Data.”
- Unified Structure Generation for Universal Information Extraction
- Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
- Bootstrapped Masked Autoencoders for Vision BERT Pretraining
- BEiT: BERT Pre-Training of Image Transformers
- VLMo: Unified vision-language pre-training
- VL-BEiT: Generative Vision-Language Pre-training
- Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT-3)
- SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
- WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing
- Unified Speech-Text Pre-training for Speech Translation and Recognition
- Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding
- LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
- LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
- MarkupLM: markup language model pre-training for visually-rich document understanding
- DiT: Self-supervised Document Image Transformer.
- TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
- LayoutReader: Pre-training of Text and Layout for Reading Order Detection
- Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting. KDD-2022
- Utilizing Expert Features for Contrastive Learning of Time-Series Representations. PMLR-2022
- Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification. TPAMI-Under review
- TARNet: Task-Aware Reconstruction for Time-Series Transformer.KDD-2022
- Self-Supervised Time Series Representation Learning with Temporal-Instance Similarity Distillation. ICML-2022 Pre-training Workshop
- Towards Universal Sequence Representation Learning for Recommender Systems , KDD 2022
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
- CrossCBR: Cross-view Contrastive Learning for Bundle Recommendation
- XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation
- R-Drop: Regularized Dropout for Neural Networks
- Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation
- Circle Loss: A Unified Perspective of Pair Similarity Optimization
- Do We Need Zero Training Loss After Achieving Zero Training Error?
- Dissecting Supervised Contrastive Learning
- A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction
- Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation
- AugLy: Data Augmentations for Robustness
- Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
- VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
- LoRA: Low-Rank Adaptation of Large Language Models
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
- Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
- Towards a Unified View of Parameter-Efficient Transfer Learning
- HyperPrompt: Prompt-based Task-Conditioning of Transformers
- Personalized Prompt Learning for Explainable Recommendation
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
-
Chain of Thought Prompting Elicits Reasoning in Large Language Models
-
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
-
The Unreliability of Explanations in Few-Shot In-Context Learning
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
-
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
- Don't Discard All the Biased Instances: Investigating a Core Assumption in Dataset Bias Mitigation Techniques
- Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
- Language-biased image classification: evaluation based on semantic representations
- A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
- Barlow constrained optimization for Visual Question Answering
- Debiasing Methods in Natural Language Understanding Make Bias More Accessible
- How Gender Debiasing Affects Internal Model Representations, and Why It Matters
- Bias Mitigation in Machine Translation Quality Estimation
- Structured Pruning Learns Compact and Accurate Models
- Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
- PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
- Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
- Exploring Extreme Parameter Compression for Pre-trained Language Models
- MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
- EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation