Skip to content

synlp/.github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Welcome to SYNLP!

The followings are some of our representative research papers.

Notes: The Language collumn in the following tables indicates that the models are evaluated on those languages in the paper. It does not mean the model will not work on other languages.

Word Embedding and Pre-trained LM

Name Paper Code Language
DSG Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings link Chinese
ZEN 1.0 ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations link Chinese
ZEN 2.0 ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders link Arabic, Chinese
T-DNA Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation link English
🔥 ChiMed-GPT ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences link Chinese, English

Model Recommendation: DSG provides 200-dimensional word embeddings for around 8M Chinese words. ZEN 2.0 provides large pre-trained language models (the large version uses 24 layers of self-attentions with 1024 dimensional hidden vectors) for Arabic and Chinese. The models are trained on large corpus and enhance text modeling through n-grams. ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) are comprehensively performed on it.

Chinese Word Segmentation and POS Tagging

Name Paper Code Language
WMSeg Improving Chinese Word Segmentation with Wordhood Memory Networks link Chinese
TwASP Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge link Chinese
McASP Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams link Chinese
GCASeg Federated Chinese Word Segmentation with Global Character Associations link Chinese

Model Recommendation: WMSeg and McASP contains easy-to-use CWS and joint CWS and POS tagging models that are based on BERT and ZEN. Models trained on different datasets are available for downloading.

Parsing

Name Paper Code Language
SAPar Improving Constituency Parsing with Span Attention link Arabic, Chinese, English
DMPar Enhancing Structure-aware Encoder with Extremely Limited Data for Graph-based Dependency Parsing link English
NeST-CCG Supertagging Combinatory Categorial Grammar with Attentive Graph Convolutional Networks link English

Model Recommendation: SAPar provides constituent parsers (which are based on BERT, XLNet, and ZEN) for Arabic, Chinese, and English; DMPar provides code for dependency parsing; NeST-CCG offers BERT-based models for English CCG supertagging. Both repositories provide pre-trained models and they are easy-to-use.

Semantic Role Labeling

Name Paper Code Language
SRL-MM Syntax-driven Approach for Semantic Role Labeling link English

Named Entity Recognition

Name Paper Code Language
SANER Named Entity Recognition for Social Media Texts with Semantic Augmentation link Chinese,English
AESINER Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information link Chinese,English
BioKMNER Improving biomedical named entity recognition with syntactic information link English

Model Recommendation: SANER use pre-trained language models and word embeddings in text modeling, with the semantic of similar words are used to enhance text understanding. Pre-trained models are available for downloading and they are easy-to-use.

Coreference Resolution

Name Paper Code Language
Pronoun-Coref-KG Knowledge-aware Pronoun Coreference Resolution link English
Pronoun-Coref Incorporating Context and External Knowledge for Pronoun Coreference Resolution link English
Visual_PCR What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues link English

Model Recommendation: Pronoun-Coref uses GloVe and ELMo embeddings in text modeling. The model is light and easy-to-use.

Aspect-level Sentiment Analysis

Name Paper Code Language
ASA-TGCN Aspect-based Sentiment Analysis with Type-aware Graph Convolutional Networks and Layer Ensemble link English
ASA-WD Enhancing Aspect-level Sentiment Analysis with Word Dependencies link English
ASA-CLD Complementary Learning of Aspect Terms for Aspect-based Sentiment Analysis link English
DGSA Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks link English
ASA-TM Improving Federated Learning for Aspect-based Sentiment Analysis via Topic Memories link English

Model Recommendation: DGSA provides an end-to-end solution (the model are based on BERT) for aspect-level sentiment analysis, which can be directly used to process raw text.

Relation Extraction

Name Paper Code Language
RE-AGCN Dependency-driven Relation Extraction with Attentive Graph Convolutional Networks link English
RE-TAMM Relation Extraction with Type-aware Map Memories of Word Dependencies link English
RE-DMP Improving Relation Extraction through Syntax-induced Pre-training with Dependency Masking link English
RE-NGCN Relation Extraction with Word Graphs from N-grams link English
RE-AMT Enhancing Relation Extraction via Adversarial Multi-task Learning link English

Model Recommendation: RE-AGCN provides BERT-based models for relation extraction, where the model leverages the auto-parsed dependency tree of the input text to have a better understanding to the text.

Domain Adaptation

Name Paper Code Language
T-DNA Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation link English
SDG4DA Reinforced Training Data Selection for Domain Adaptation link English
DPM4DA Domain Adaptation for Disease Phrase Matching with Adversarial Networks -- English
TD4DA Entropy-based Training Data Selection for Domain Adaptation -- Chinese, English
GM4DA Using a goodness measurement for domain adaptation: A case study on Chinese word segmentation -- Chinese

Model Recommendation: T-DNA is a Transformer-based language model for domain adaptation, which can be used easily.

Medical NER

Name Paper Code Language
HET-MC Summarizing Medical Conversations via Identifying Important Utterances link Chinese
BioKMNER Improving biomedical named entity recognition with syntactic information link English

Radiology Report Generation

Name Paper Code Language
R2GenRL Reinforced Cross-modal Alignment for Radiology Report Generation link English
R2GenCMN Cross-modal Memory Networks for Radiology Report Generation link English
R2Gen Generating Radiology Reports via Memory-driven Transformer link English
🔥RRG-Review A Systematic Review of Deep Learning-based Research on Radiology Report Generation -- English

Language Resource

Name Paper Code Language
ChiMed ChiMed: A Chinese Medical Corpus for Question Answering link Chinese
ChiMST ChiMST: A Chinese Medical Corpus for Word Segmentation and Medical Term Recognition link Chinese
Chinese CCGBank Chinese CCGBank Construction from Tsinghua Chinese Treebank -- Chinese
HNZ The Construction of a Segmented and Part-of-speech Tagged Archaic Chinese Corpus: A Case Study on Huainanzi link Chinese

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published