|
| 1 | + |
| 2 | +<!DOCTYPE html> |
| 3 | +<html> |
| 4 | +<head> |
| 5 | + <title>Homepage for Language Net</title> |
| 6 | + <link rel="stylesheet" type="text/css" href="project.css"> |
| 7 | + |
| 8 | +<script> |
| 9 | +function showHide(args) { |
| 10 | + var name = arguments[0] |
| 11 | + if (document.getElementById(name).style.display == 'block') { |
| 12 | + document.getElementById(name).style.display='none'; |
| 13 | + } else { |
| 14 | + document.getElementById(name).style.display='block'; |
| 15 | + } |
| 16 | +} |
| 17 | +</script> |
| 18 | +</head> |
| 19 | + |
| 20 | +<body> |
| 21 | + |
| 22 | +<br> |
| 23 | +<center> |
| 24 | + <h1 style="color:dodgerblue">Language-Net: The Large Scale Paraphrase Dataset</h1> |
| 25 | +</center> |
| 26 | +<br> |
| 27 | + |
| 28 | +<h3 style="color: brown">The Corpus</h3> |
| 29 | + |
| 30 | +<ul> |
| 31 | + <li>The Language-Net is a collection of sentence level paraphrases from Twitter by linking tweets through shared |
| 32 | +URLs. This corpus is the largest up to date with 51,524 human annotated sentence pairs: 42200 for training and 9324 for testing. It can grow 30,000 |
| 33 | +new sentential paraphrases per month with ∼70% precision. Now we have 1-year data available: 2,869,657 candidate pairs! <br><br> |
| 34 | + The following paper introduces the corpus in detail:<br> |
| 35 | + <a class="publink" href="http://www.aclweb.org/anthology/D/D17/D17-1126.pdf">A Continuously Growing Dataset of Sentential Paraphrases</a> |
| 36 | + <br/><b><a href="https://lanwuwei.github.io/">Wuwei Lan</a></b>, Siyu Qiu, Hua He and Wei Xu. <cite>EMNLP 2017</cite>. |
| 37 | + <br/><a class="button" href="http://www.aclweb.org/anthology/D/D17/D17-1126.pdf">pdf</a> <a class="button" href="http://www.aclweb.org/anthology/D/D17/D17-1126.bib">BibTeX</a> <a class="button" href="https://lanwuwei.github.io/Wuwei_OSU_2017_v2.pdf">slides</a> <a class="button" href="https://lanwuwei.github.io/url-data-poster.pdf">poster</a> |
| 38 | + </li> |
| 39 | +</ul> |
| 40 | + |
| 41 | + |
| 42 | +<!-----Examples-----> |
| 43 | + <a name="Examples"></a> |
| 44 | + <h3 style="color:brown">Example Pairs</h3> |
| 45 | + <ul> |
| 46 | + <table class="newstuff" style="border-collapse: separate; |
| 47 | + border-spacing: 0 1em;"> |
| 48 | + <tr><th>Sentence 1</th> <th>Label</th> <th>Sentence 2</th></tr> |
| 49 | + <tr> |
| 50 | + <td style="padding:0 15px 0 15px;">Samsung halts production of its Galaxy Note 7 as battery problems linger.</td> |
| 51 | + <td style="padding:0 15px 0 15px;">True</td> |
| 52 | + <td style="padding:0 15px 0 15px;">#Samsung temporarily suspended production of its Galaxy #Note7 devices following reports</td> |
| 53 | + </tr> |
| 54 | + <tr> |
| 55 | + <td style="padding:0 15px 0 15px;">CO2 levels mark ‘new era’ in the world’s changing climate.</td> |
| 56 | + <td style="padding:0 15px 0 15px;">True</td> |
| 57 | + <td style="padding:0 15px 0 15px;">CO2 levels haven’t been this high for 3 to 5 million years.</td> |
| 58 | + </tr> |
| 59 | + <tr> |
| 60 | + <td style="padding:0 15px 0 15px;">The 7 biggest changes Obamacare made , and those that may disappear.</td> |
| 61 | + <td style="padding:0 15px 0 15px;">False</td> |
| 62 | + <td style="padding:0 15px 0 15px;">What a repeal of Obamacare would look like , in plain English.</td> |
| 63 | + </tr> |
| 64 | + <tr> |
| 65 | + <td style="padding:0 15px 0 15px;">Fraugster , a startup that uses AI to detect payment fraud , raises $5M.</td> |
| 66 | + <td style="padding:0 15px 0 15px;">False</td> |
| 67 | + <td style="padding:0 15px 0 15px;">AI is on the rise and in this case being applied to something worthwhile payment fraud.</td> |
| 68 | + </tr> |
| 69 | + </table> |
| 70 | + </ul> |
| 71 | + |
| 72 | +<!----Published Results-----> |
| 73 | + <a name="Baseline Results"></a> |
| 74 | + <h3 style="color:brown">Baseline Results</h3> |
| 75 | + <ul> |
| 76 | + <table class="newstuff" style="border-collapse: separate; |
| 77 | + border-spacing: 0 1em;"> |
| 78 | + <tr><th>Publication</th> <th>Model</th> <th>F1</th></tr> |
| 79 | + <tr> |
| 80 | + <td style="padding:0 20px 0 20px;"><a href="https://www.aclweb.org/anthology/P/P09/P09-1053.pdf">Das et al.'09 </a></td> |
| 81 | + <td style="padding:0 20px 0 20px;">Logistic Regression: n-gram overlap features</td> |
| 82 | + <td style="padding:0 20px 0 20px;">0.683</td> |
| 83 | + </tr> |
| 84 | + <tr> |
| 85 | + <td style="padding:0 20px 0 20px;"><a href="https://cocoxu.github.io/publications/tacl2014-extracting-paraphrases-from-twitter.pdf">Xu et al.'14 </a></td> |
| 86 | + <td style="padding:0 20px 0 20px;">LEX-WMF: logistic regression + weighted matrix factorization</td> |
| 87 | + <td style="padding:0 20px 0 20px;">0.693</td> |
| 88 | + </tr> |
| 89 | + <tr> |
| 90 | + <td style="padding:0 20px 0 20px;"><a href="http://www.aclweb.org/anthology/N16-1108">He et al.'16 </a></td> |
| 91 | + <td style="padding:0 20px 0 20px;">PWIM: pairwise word interaction model</td> |
| 92 | + <td style="padding:0 20px 0 20px;">0.749</td> |
| 93 | + </tr> |
| 94 | + <tr> |
| 95 | + <td style="padding:0 20px 0 20px;"><a href="https://cocoxu.github.io/publications/Wuwei_NAACL_2018.pdf">Lan et al.'18 </a></td> |
| 96 | + <td style="padding:0 20px 0 20px;">Subword-PWIM: subword embedding based PWIM with multi-task LM</td> |
| 97 | + <td style="padding:0 20px 0 20px;">0.768</td> |
| 98 | + </tr> |
| 99 | + </table> |
| 100 | + </ul> |
| 101 | + |
| 102 | +<!----Download-----> |
| 103 | + <a name="Download"></a> |
| 104 | + <h3 style="color:brown">Download</h3> |
| 105 | + <ul> |
| 106 | + Please fill in the following <a href="https://frozen-ridge-97042.herokuapp.com/">form </a> to request access to the TwitterPPDB corpus and 1-year candidate pairs. It is released for non-commercial use under the CC BY-NC-SA 3.0 |
| 107 | + license. Use of the data must abide by the Twitter Terms of Service and Developer Policy. For any comments or questions, please email <a href="mailto:lan.105@osu.edu">Wuwei Lan</a>. |
| 108 | + </ul> |
| 109 | + |
| 110 | +<!----Related Resource-----> |
| 111 | + <a name="Related Resource"></a> |
| 112 | + <h3 style="color:brown">Related Resource</h3> |
| 113 | + <ul> |
| 114 | + <a href="https://github.com/cocoxu/SemEval-PIT2015"> PIT-2015</a>: sentence level paraphrases from Twitter based on the same trending topic. |
| 115 | + Please check this <a href="https://github.com/cocoxu/SemEval-PIT2015">website </a> for more info. |
| 116 | + </ul> |
| 117 | + |
| 118 | +</body> |
| 119 | + |
| 120 | +</html> |
| 121 | + |
0 commit comments