news

lanwuwei · lanwuwei · commit 663a2c18e65f · 2018-05-25T12:09:16.000-04:00
diff --git a/index.html b/index.html
@@ -2,7 +2,7 @@
 <!DOCTYPE html>
 <html>
 <head>
-  <title>Homepage for Language Net</title>
+  <title>Homepage for LanguageNet</title>
   <link rel="stylesheet" type="text/css" href="project.css">
 
 <script>
@@ -21,16 +21,23 @@
 
 <br>
 <center>
-  <h1 style="color:dodgerblue">Language-Net: The Large Scale Paraphrase Dataset</h1>
+  <h1 style="color:dodgerblue">LanguageNet: Large Scale Multilingual Paraphrase Corpus</h1>
 </center>
 <br>
 
+<h3 style="color: brown">What's New</h3>
+
+<ul>
+ <li>We are building large scale multilingual paraphrase datasets right now. As planned, we will have 10-language corpus and each language has ~50k pairs!
+    </li>
+</ul>
+
 <h3 style="color: brown">The Corpus</h3>
 
 <ul>
- <li>The Language-Net is a collection of sentence level paraphrases from Twitter by linking tweets through shared
+ <li>The LanguageNet (English) is a collection of sentence level paraphrases from Twitter by linking tweets through shared
 URLs. This corpus is the largest up to date with 51,524 human annotated sentence pairs: 42200 for training and 9324 for testing. It can grow 30,000
-new sentential paraphrases per month with ∼70% precision. Now we have 1-year data available: 2,869,657 candidate pairs! <br><br>
+new sentential paraphrases per month with ~70% precision. Now we have 1-year data available: 2,869,657 candidate pairs! <br><br>
         The following paper introduces the corpus in detail:<br>
         <a class="publink" href="http://www.aclweb.org/anthology/D/D17/D17-1126.pdf">A Continuously Growing Dataset of Sentential Paraphrases</a>
    <br/><b><a href="https://lanwuwei.github.io/">Wuwei Lan</a></b>, Siyu Qiu, Hua He and Wei Xu. <cite>EMNLP 2017</cite>.