Awesome Semantic Textual Similarity: A Curated List of Semantic/Sentence Textual Similarity (STS) in Large Language Models and the NLP Field
This repository, called Awesome Semantic Textual Similarity, contains a collection of resources and papers on Semantic/Sentence Textual Similarity (STS) in Large Language Models and NLP.
"If you can't measure it, you can't improve it. " - British Physicist William Thomson
Welcome to share your papers, thoughts, and ideas by submitting an issue!
- Model Evolution Overview
- Presentations
- Benchmarks
- Papers
- Distance Measurement
- Evaluation Metrics
- Citation
Sentence Textual Similarity: Model Evolution Overview
Shuyue Jia, Dependable Computing Laboratory, Boston University
[Link]
Oct 2023
Please check here and here to download all the benchmark databases below.
STS12:
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre
SemEval 2012, [Paper] [Download]
07 June 2012
STS13:
*SEM 2013 shared task: Semantic Textual Similarity
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo
*SEM 2013, [Paper] [Download]
13 June 2013
STS14:
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, Janyce Wiebe
SemEval 2014, [Paper] [Download]
23 Aug 2014
STS15:
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe
SemEval 2015, [Paper] [Download]
04 June 2015
STS16:
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, Janyce Wiebe
SemEval 2016, [Paper] [Download]
16 June 2016
STS Benchmark (STSb):
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia
SemEval 2017, [Paper] [Download]
03 Aug 2017
A SICK Cure for the Evaluation of Compositional Distributional Semantic Models
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, Roberto Zamparelli
LREC 2014, [Paper] [Download]
26 May 2014
GloVe: Global Vectors for Word Representation
Jeffrey Pennington, Richard Socher, Christopher Manning
EMNLP 2014, [Paper] [GitHub]
25 Oct 2014
Skip-Thought Vectors
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
NeurIPS 2015, [Paper] [GitHub]
22 Jun 2015
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, Antoine Bordes
EMNLP 2017, [Paper] [GitHub]
07 Sept 2017
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
NAACL-HLT 2019, [Paper] [GitHub]
24 May 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi
ICLR 2020, [Paper] [GitHub]
24 Feb 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam, Dipanjan Das, Ankur Parikh
ACL 2020, [Paper] [GitHub]
05 July 2020
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
EMNLP 2020, [Paper] [GitHub]
16 Nov 2020
Universal Sentence Encoder
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
arXiv 2018, [Paper] [GitHub]
12 Apr 2018
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers, Iryna Gurevych
EMNLP 2019, [Paper] [GitHub]
27 Aug 2019
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement
Hua He, Jimmy Lin
NAACL 2016, [Paper]
12 June 2016
Text Matching as Image Recognition
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng
AAAI 2016, [Paper] [GitHub]
20 Feb 2016
MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity
Myeongjun Jang, Deuk Sin Kwon, Thomas Lukasiewicz
IJCNLP 2015, [Paper]
26 July 2015
Simple and Effective Text Matching with Richer Alignment Features
Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, Haiqing Chen
ACL 2019, [Paper] [GitHub]
01 Aug 2019
Semantic Sentence Matching with Densely-Connected Recurrent and Co-Attentive Information
Seonhoon Kim, Inho Kang, Nojun Kwak
AAAI 2019, [Paper] [GitHub (Unofficial)]
27 January 2019
Multiway Attention Networks for Modeling Sentence Pairs
Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou
IJCAI 2018, [Paper] [GitHub]
13 July 2018
Natural Language Inference over Interaction Space
Yichen Gong, Heng Luo, Jian Zhang
EMNLP 2017, [Paper] [GitHub]
13 Sep 2017
Inter-Weighted Alignment Network for Sentence Pair Modeling
Gehui Shen, Yunlun Yang, Zhi-Hong Deng
EMNLP 2017, [Paper]
07 Sept 2017
Bidirectional Attention Flow for Machine Comprehension
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
ICLR 2017, [Paper] [Webpage] [GitHub]
24 Apr 2017
A Structured Self-attentive Sentence Embedding
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
EMNLP 2017, [Paper] [GitHub]
09 Mar 2017
Sentence Similarity Learning by Lexical Decomposition and Composition
Zhiguo Wang, Haitao Mi, Abraham Ittycheriah
COLING 2016, [Paper] [GitHub]
11 Dec 2016
A Decomposable Attention Model for Natural Language Inference
Ankur Parikh, Oscar Täckström, Dipanjan Das, Jakob Uszkoreit
EMNLP 2016, [Paper] [GitHub]
01 Nov 2016
Reasoning about Entailment with Neural Attention
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom
ICLR 2016, [Paper] [GitHub]
1 Mar 2016
DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition
Md Arafat Sultan, Steven Bethard, Tamara Sumner
SemEval 2015, [Paper]
04 June 2015
Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence
Md Arafat Sultan, Steven Bethard, Tamara Sumner
TACL 2014, [Paper]
01 May 2014
Improving Word Mover’s Distance by Leveraging Self-attention Matrix
Hiroaki Yamagiwa, Sho Yokoi, Hidetoshi Shimodaira
EMNLP 2023 Findings, [Paper] [GitHub]
02 Nov 2023
Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
Seonghyeon Lee, Dongha Lee, Seongbo Jang, Hwanjo Yu
ACL 2022, [Paper] [GitHub]
22 May 2022
Word Rotator’s Distance
Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui
EMNLP 2020, [Paper] [GitHub]
16 Nov 2020
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger
EMNLP 2019, [Paper] [GitHub]
03 Nov 2019
From Word Embeddings To Document Distances
Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger
ICML 2015, [Paper] [GitHub]
06 July 2015
Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline
Kawin Ethayarajh
RepL4NLP 2018, [Paper] [GitHub]
20 July 2018
An Efficient Framework for Learning Sentence Representations
Lajanugen Logeswaran, Honglak Lee
ICLR 2018, [Paper] [GitHub]
30 Apr 2018
Universal Sentence Encoder
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
arXiv 2018, [Paper] [GitHub]
12 Apr 2018
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, Antoine Bordes
EMNLP 2017, [Paper] [GitHub]
07 Sept 2017
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Sanjeev Arora, Yingyu Liang, Tengyu Ma
ICLR 2017, [Paper] [GitHub]
06 Feb 2017
Learning Distributed Representations of Sentences from Unlabelled Data
Felix Hill, Kyunghyun Cho, Anna Korhonen
NAACL 2016, [Paper] [GitHub (Unofficial)]
12 Jun 2016
Skip-Thought Vectors
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
NeurIPS 2015, [Paper] [GitHub]
22 Jun 2015
Distributed Representations of Sentences and Documents
Quoc V. Le, Tomas Mikolov
ICML 2014, [Paper]
21 June 2014
Whitening Sentence Representations for Better Semantics and Faster Retrieval
Jianlin Su, Jiarun Cao, Weijie Liu, Yangyiwen Ou
arXiv 2021, [Paper] [GitHub (TensorFlow)] [GitHub (PyTorch)]
29 Mar 2021
On the Sentence Embeddings from Pre-trained Language Models
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei Li
EMNLP 2020, [Paper] [GitHub]
02 Nov 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Bin Wang, C.-C. Jay Kuo
IEEE/ACM T-ASLP, [Paper] [GitHub]
29 July 2020
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers, Iryna Gurevych
EMNLP 2019, [Paper] [GitHub]
27 Aug 2019
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam, Dipanjan Das, Ankur Parikh
ACL 2020, [Paper] [GitHub]
05 July 2020
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi
ICLR 2020, [Paper] [GitHub]
24 Feb 2020
Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
Seonghyeon Lee, Dongha Lee, Seongbo Jang, Hwanjo Yu
ACL 2022, [Paper] [GitHub]
22 May 2022
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Tianyu Gao, Xingcheng Yao, Danqi Chen
EMNLP 2021, [Paper] [GitHub]
03 Jun 2021
Self-Guided Contrastive Learning for BERT Sentence Representations
Taeuk Kim, Kang Min Yoo, Sang-goo Lee
ACL 2021, [Paper] [GitHub]
03 Jun 2021
ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
Yuanmeng Yan, Rumei Li, Sirui Wang, Fuzheng Zhang, Wei Wu, Weiran Xu
ACL 2021, [Paper] [GitHub]
25 May 2021
Semantic Re-tuning with Contrastive Tension
Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren
ICLR 2021, [Paper] [GitHub]
03 May 2021
CLEAR: Contrastive Learning for Sentence Representation
Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, Hao Ma
arXiv 2020, [Paper]
31 Dec 2020
Evolution of Semantic Similarity - A Survey
Dhivya Chandrasekaran, Vijay Mago
ACM Computing Survey 2021, [Paper]
18 February 2021
Distributional Measures of Semantic Distance: A Survey
Saif M. Mohammad, Graeme Hirst
arXiv 2012, [Paper]
8 Mar 2012
Pearson Linear Correlation Coefficient − measure the prediction accuracy
where
Spearman’s Rank-order Correlation Coefficient − measure the prediction monotonicity
where
If you find our list useful, please consider citing our repo and toolkit in your publications. We provide a BibTeX entry below.
@misc{JiaAwesomeSTS23,
author = {Jia, Shuyue},
title = {Awesome Semantic Textual Similarity},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/SuperBruceJia/Awesome-Semantic-Textual-Similarity}},
}
@misc{JiaAwesomeLLM23,
author = {Jia, Shuyue},
title = {Awesome {LLM} Self-Consistency},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/SuperBruceJia/Awesome-LLM-Self-Consistency}},
}
@misc{JiaPromptCraft23,
author = {Jia, Shuyue},
title = {{PromptCraft}: A Prompt Perturbation Toolkit},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/SuperBruceJia/promptcraft}},
}