You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h1style="color:dodgerblue">Language-Net: The Large Scale Paraphrase Dataset</h1>
24
+
<h1style="color:dodgerblue">LanguageNet: Large Scale Multilingual Paraphrase Corpus</h1>
25
25
</center>
26
26
<br>
27
27
28
+
<h3style="color: brown">What's New</h3>
29
+
30
+
<ul>
31
+
<li>We are building large scale multilingual paraphrase datasets right now. As planned, we will have 10-language corpus and each language has ~50k pairs!
32
+
</li>
33
+
</ul>
34
+
28
35
<h3style="color: brown">The Corpus</h3>
29
36
30
37
<ul>
31
-
<li>The Language-Net is a collection of sentence level paraphrases from Twitter by linking tweets through shared
38
+
<li>The LanguageNet (English) is a collection of sentence level paraphrases from Twitter by linking tweets through shared
32
39
URLs. This corpus is the largest up to date with 51,524 human annotated sentence pairs: 42200 for training and 9324 for testing. It can grow 30,000
33
-
new sentential paraphrases per month with ∼70% precision. Now we have 1-year data available: 2,869,657 candidate pairs! <br><br>
40
+
new sentential paraphrases per month with ~70% precision. Now we have 1-year data available: 2,869,657 candidate pairs! <br><br>
34
41
The following paper introduces the corpus in detail:<br>
35
42
<aclass="publink" href="http://www.aclweb.org/anthology/D/D17/D17-1126.pdf">A Continuously Growing Dataset of Sentential Paraphrases</a>
36
43
<br/><b><ahref="https://lanwuwei.github.io/">Wuwei Lan</a></b>, Siyu Qiu, Hua He and Wei Xu. <cite>EMNLP 2017</cite>.
0 commit comments