Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore	.gitignore
README.md	README.md

ECNU transformers Reading List

本reading-list是华东师范大学智能知识管理团队(负责人：王晓玲教授)在2022年下半年讨论班的论文阅读目录.

本讨论班围绕transformer模型这个目前大火的模型，分享学习其架构设计，预训练和微调方面的最新研究进展。这个reading-list还在不断地完善中，我们会持续更新这个reading-list. 我们非常欢迎Pull Request ! 如果有好的建议，欢迎联系 wzhu@stu.ecnu.edu.cn.

综述
Tutorial
Research papers

综述

Lin, Tianyang, Yuxin Wang, Xiangyang Liu and Xipeng Qiu. A Survey of Transformers.
Qiu, Xipeng, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai and Xuanjing Huang. Pre-trained Models for Natural Language Processing: A Survey.
Khan, Salman Hameed, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan and Mubarak Shah. “Transformers in Vision: A Survey.” ACM Computing Surveys (CSUR) (2022)
Yu, Junliang, Hongzhi Yin, Xin Xia, Tong Chen, Jundong Li and Zi-Liang Huang. “Self-Supervised Learning for Recommender Systems: A Survey.”
Han, Xu, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-rong Wen, Jinhui Yuan, Wayne Xin Zhao and Jun Zhu. “Pre-Trained Models: Past, Present and Future.” AI Open 2 (2021): 225-250.
Du, Yifan, Zikang Liu, Junyi Li and Wayne Xin Zhao. “A Survey of Vision-Language Pre-Trained Models.” IJCAI (2022).
Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi and Graham Neubig. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.”
Transformers in Time Series: A Survey

Tutorial

Vision-Language Pretraining: Current Trends and the Future. ACL-2022.
Tutorial on MultiModal Machine Learning. CVPR-2022.
Beyond Convolutional Neural Networks. CVPR-2022.
Denoising Diffusion-based Generative Modeling: Foundations and Applications. CVPR-2022.
Pre-training Methods for Neural Machine Translation. ACL-2021.
Contrastive Data and Learning for Natural Language Processing. ACL-2022
Robust Time Series Analysis and Applications: An Industrial Perspective. KDD-2022
Time Series in Healthcare: Challenges and Solutions. AAAI-2022

Research papers

Transformer Architecture

Pre-training

Language pretraining

He, Pengcheng, Jianfeng Gao and Weizhu Chen. “DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing.”
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei. DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders.
Qin, Yujia, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun and Jie Zhou. “ELLE: Efficient Lifelong Pre-training for Emerging Data.”
Unified Structure Generation for Universal Information Extraction

Vision pretraining

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
BEiT: BERT Pre-Training of Image Transformers

Vision-Language pretraining

VLMo: Unified vision-language pre-training
VL-BEiT: Generative Vision-Language Pre-training
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT-3)

Speech-Language pretraining

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing
Unified Speech-Text Pre-training for Speech Translation and Recognition
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

Document pretraining

LayoutLM: Pre-training of Text and Layout for Document Image Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
MarkupLM: markup language model pre-training for visually-rich document understanding
DiT: Self-supervised Document Image Transformer.
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Time-series pretraining

Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting. KDD-2022
Utilizing Expert Features for Contrastive Learning of Time-Series Representations. PMLR-2022
Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification. TPAMI-Under review
TARNet: Task-Aware Reconstruction for Time-Series Transformer.KDD-2022
Self-Supervised Time Series Representation Learning with Temporal-Instance Similarity Distillation. ICML-2022 Pre-training Workshop

Recomendation pretraining

Towards Universal Sequence Representation Learning for Recommender Systems , KDD 2022
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
CrossCBR: Cross-view Contrastive Learning for Bundle Recommendation
XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation

Fine-tuning

微调涨分有效方法

R-Drop: Regularized Dropout for Neural Networks
Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation
Circle Loss: A Unified Perspective of Pair Similarity Optimization
Do We Need Zero Training Loss After Achieving Zero Training Error?
Dissecting Supervised Contrastive Learning

Robustness

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction
Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation
AugLy: Data Augmentations for Robustness

Parameter efficient fine-tuning

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
LoRA: Low-Rank Adaptation of Large Language Models
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
Towards a Unified View of Parameter-Efficient Transfer Learning
HyperPrompt: Prompt-based Task-Conditioning of Transformers
Personalized Prompt Learning for Explainable Recommendation

Prompt learning

Multitask Prompted Training Enables Zero-Shot Task Generalization
Chain of Thought Prompting Elicits Reasoning in Large Language Models
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
Multitask Prompted Training Enables Zero-Shot Task Generalization
Can large language models reason about medical questions
Chain of Thought Imitation with Procedure Cloning
Inferring Implicit Relations with Language Models
Can language models learn from explanations in context
The Unreliability of Explanations in Few-Shot In-Context Learning
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

Debiasing

Don't Discard All the Biased Instances: Investigating a Core Assumption in Dataset Bias Mitigation Techniques
Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
Language-biased image classification: evaluation based on semantic representations
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Barlow constrained optimization for Visual Question Answering
Debiasing Methods in Natural Language Understanding Make Bias More Accessible
How Gender Debiasing Affects Internal Model Representations, and Why It Matters
Bias Mitigation in Machine Translation Quality Estimation

Inference speedup

Structured Pruning Learns Compact and Accurate Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
Exploring Extreme Parameter Compression for Pre-trained Language Models
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECNU transformers Reading List

Contents

综述

Tutorial

Research papers

Transformer Architecture

Sequence modeling

Vision Transformers

Graph Transformers

Pre-training

Language pretraining

Vision pretraining

Vision-Language pretraining

Speech-Language pretraining

Document pretraining

Time-series pretraining

Recomendation pretraining

Fine-tuning

微调涨分有效方法

Robustness

Parameter efficient fine-tuning

Prompt learning

Debiasing

Inference speedup

About

Releases

Packages

michael-wzhu/ECNU-transformers-learning-groups-2022

Folders and files

Latest commit

History

Repository files navigation

ECNU transformers Reading List

Contents

综述

Tutorial

Research papers

Transformer Architecture

Sequence modeling

Vision Transformers

Graph Transformers

Pre-training

Language pretraining

Vision pretraining

Vision-Language pretraining

Speech-Language pretraining

Document pretraining

Time-series pretraining

Recomendation pretraining

Fine-tuning

微调涨分有效方法

Robustness

Parameter efficient fine-tuning

Prompt learning

Debiasing

Inference speedup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages