Skip to content

Commit 2083f6e

Browse files
author
yiqihuang
committed
modify English task pages
1 parent be68564 commit 2083f6e

7 files changed

+22
-22
lines changed

docs/co-reference_resolution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ Scoring code: https://github.com/conll/reference-coreference-scorers
4646

4747
| System | Average F1 of MUC, B-cubed, CEAF |
4848
| --- | --- |
49-
| [Kong & Jian (2019)](https://www.ijcai.org/Proceedings/2019/700) | 63.85 |
5049
| [Clark & Manning (2016b)](https://nlp.stanford.edu/static/pubs/clark2016deep.pdf) | 63.88 |
50+
| [Kong & Jian (2019)](https://www.ijcai.org/Proceedings/2019/700) | 63.85 |
5151
| [Clark & Manning (2016a)](https://nlp.stanford.edu/static/pubs/clark2016improving.pdf) | 63.66 |
5252

5353
### Resources

docs/entity_linking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ NERC F-score
5757
| --- | --- | --- | --- |
5858
| [Sil et al (2018)](https://arxiv.org/abs/1712.01813) | 84.4 | | |
5959
| [Pan et al (2020)](https://www.aclweb.org/anthology/D19-6107.pdf) | 84.2 | | |
60-
| [Pan et al (2020)](https://www.aclweb.org/anthology/D19-6107.pdf) | 81.2 (unsup)| | |
60+
| [Pan et al (2020)](https://www.aclweb.org/anthology/D19-6107.pdf) | 81.2 (unsupervised)| | |
6161
| Best anonymous system in shared task writeup | 76.9 | 76.2 | 67.8 |
6262

6363
### Resources

docs/language_modeling.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,17 +77,17 @@ These numbers are not comparable, given different training conditions.
7777
| [Huang et al, 2010 [GW v2]](http://www.imaging.org/site/PDFS/Reporter/Articles/2010_25/Rep25_2_EI2010_HUANG.pdf) | -- | 220.6 | 610m chars, random 11m for test. MSR segmenter. |
7878
| Neural Lattice Models [v5] [Buckman+Neubig, 2018](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00036) | 32.19 | -- | *Guangming Daily subset, top 10k chars + UNK, length <150. 934k lines train, 30k line test. Data [here](https://github.com/jbuckman/neural-lattice-language-models). |
7979

80-
### Other Resources
80+
## Other Resources
8181

82-
## <span class="t">Common Crawl Data</span>
82+
### <span class="t">Common Crawl Data</span>
8383

8484
[CommonCrawl](https://commoncrawl.org) has released enormous quantities of web-crawled data that can be mined for Chinese text. Several groups have built their own pipelines to do the extraction and filtering.
8585

8686
The CLUE Organization extracted "Clue Corpus 2020" (also called "C5") from the Common Crawl data. It is 100G raw text with 35 billion Chinese characters.
8787
Intended to be a large-scale corpus for pre-training Chinese language models.
8888
Preprint paper by [Xu, Zhang, and Dong](https://arxiv.org/abs/2003.01355v2)
8989

90-
## <span class="t">CLUECorpusSmall </span>
90+
### <span class="t">CLUECorpusSmall </span>
9191

9292
Publicly-available data, collected at https://github.com/CLUEbenchmark/CLUECorpus2020 and https://github.com/brightmart/nlp_chinese_corpus
9393
Includes:

docs/machine_translation.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ The United States and China may soon reach a trade agreement.
3535
* BLEU-SBP ((Chiang et al 08)[http://aclweb.org/anthology/D08-1064]). Addresses decomposability problems with Bleu, proposing a cross between Bleu and word error rate.
3636
* HTER. Returns the number of edits performed by a human posteditor to get an automatic translation into good shape.
3737

38-
## <span class="t">ZH-EN</span>.
38+
## ZH-EN
3939

40-
### <span class="t">WMT</span>.
40+
## <span class="t">WMT</span>.
4141

4242
The Second Conference on Machine Translation (WMT17) has a Chinese/English MT component, done in cooperation with CWMT 2017.
4343
* [Website](http://www.statmt.org/wmt17)
@@ -91,7 +91,7 @@ The Linguistic Data Consortium has additional resources, such as FBIS and NIST t
9191

9292

9393

94-
### <span class="t">NIST</span>.
94+
## <span class="t">NIST</span>.
9595

9696
NIST has a long history of supporting Chinese-English translation by creating annual test sets and running annual NIST OpenMT evaluations during the 2000s. Many sites have reported results on NIST test sets.
9797

@@ -134,13 +134,13 @@ The Linguistic Data Consortium provides training materials typically used for NI
134134

135135

136136

137-
### <span class="t">IWSLT 2015</span>.
137+
## <span class="t">IWSLT 2015</span>.
138138

139139
* Translation of TED talks
140140
* Chinese-to-English track
141141
* [Shared task overview](https://cris.fbk.eu/retrieve/handle/11582/303031/9811/main.pdf)
142142

143-
| Dataset | Size (sentences) | # of talks | Genre |
143+
| Test sets | Size (sentences) | # of talks | Genre |
144144
| --- | --- | --- | --- |
145145
| tst2014 | 1068 | 12 | TED talks |
146146
| tst2015 | 1,080 | 12 | TED talks |
@@ -203,9 +203,9 @@ English to Chinese
203203
[The Multitarget TED Talks Task (MTTT)](http://cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/)
204204

205205

206-
## <span class="t">ZH-JA</span>.
206+
## ZH-JA
207207

208-
### <span class="t">Workshop on Asian Translation</span>.
208+
## <span class="t">Workshop on Asian Translation</span>.
209209

210210
[The Workshop on Asian Translation](http://lotus.kuee.kyoto-u.ac.jp/WAT/) has run since 2014. Here, we include the 2018 Chinese/Japanese evaluations.
211211

@@ -255,7 +255,7 @@ Participants must get data from [here](http://lotus.kuee.kyoto-u.ac.jp/WAT/paten
255255
| Japanese-Chinese devtest | 2000 | Patents |
256256

257257

258-
### <span class="t">IWSLT2020 ZH-JA Open Domain Translation</span>.
258+
## <span class="t">IWSLT2020 ZH-JA Open Domain Translation</span>.
259259

260260
[The shared task](http://iwslt.org/doku.php?id=open_domain_translation) is to promote research on translation between Asian languages, exploitation of noisy parallel web corpora for MT and smart processing of data and provenance.
261261

@@ -298,9 +298,9 @@ Japanese to Chinese
298298
| Existing parallel sources | 1,963,238 | mixed-genre |
299299

300300

301-
## <span class="t">Others</span>.
301+
## Others
302302

303-
### <span class="t">CWMT</span>.
303+
## <span class="t">CWMT</span>.
304304

305305
[CWMT 2017](http://ee.dlut.edu.cn/CWMT2017/index_en.html)
306306
and [2018](http://www.cipsc.org.cn/cwmt/2018/english/)

docs/relation_extraction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Output:
1717

1818
```
1919
(entity1: 李晓华, entity2: 王大牛, relation: 夫妻)
20-
````
20+
```
2121

2222
## Standard Metrics
2323

docs/spell_correction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Results above are all on the SIGHAN 2015 test set.
7272

7373
| Source | # sentence pairs | # chars | # spelling errors | character set | genre |
7474
| --- | --- | --- | --- | --- | --- |
75-
| Synthetic training dataset ([Wang et. al. 2018](https://www.aclweb.org/anthology/P19-1578)) 271,329 | 12M | 382,702 | simplified | news |
75+
| Synthetic training dataset ([Wang et. al. 2018](https://www.aclweb.org/anthology/P19-1578)) | 271,329 | 12M | 382,702 | simplified | news |
7676

7777
---
7878

docs/topic_classification.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Chinese Text Classification / Topic Classification
1+
# Chinese Text Classification
22

33

44
## Background
@@ -113,9 +113,9 @@ First paragraphs of Chinese news articles from 2006-2016 were evenly split into
113113

114114
| | Accuracy |
115115
| --- | --- |
116-
| [[Meng et al, 2019]](https://arxiv.org/pdf/1901.10125.pdf) | 85.8% |
116+
| [Meng et al, 2019](https://arxiv.org/pdf/1901.10125.pdf) | 85.8% |
117117
| [Sun, Baohua, et al](https://arxiv.org/abs/1810.07653) | 84.4% |
118-
| [[Zhang and Lecun 2017]](https://arxiv.org/abs/1708.02657) | 83.7% |
118+
| [Zhang and Lecun 2017](https://arxiv.org/abs/1708.02657) | 83.7% |
119119

120120
### Resources
121121

@@ -140,8 +140,8 @@ Chinese news articles from 2008- 2016 were evenly split into 7 news channels, re
140140
| | Accuracy |
141141
| --- | --- |
142142
| [Sun, Baohua, et al](https://arxiv.org/abs/1810.07653) | 92.0% |
143-
| [[Meng et al, 2019]](https://arxiv.org/pdf/1901.10125.pdf) | 91.9% |
144-
| [[Zhang and Lecun 2017]](https://arxiv.org/abs/1708.02657) | 90.9% |
143+
| [Meng et al, 2019](https://arxiv.org/pdf/1901.10125.pdf) | 91.9% |
144+
| [Zhang and Lecun 2017](https://arxiv.org/abs/1708.02657) | 90.9% |
145145

146146
### Resources
147147

0 commit comments

Comments
 (0)