Skip to content

Latest commit

 

History

History

eng-roa

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn fvr glg ita lad lad_Latn lij lld_Latn lmo mwl oci osp_Latn pms por roh ron scn spa vec wln
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 2.2 0.147
Tatoeba-test.eng-ast.eng.ast 17.2 0.415
Tatoeba-test.eng-cat.eng.cat 47.7 0.669
Tatoeba-test.eng-cos.eng.cos 3.2 0.262
Tatoeba-test.eng-egl.eng.egl 0.4 0.119
Tatoeba-test.eng-ext.eng.ext 5.5 0.304
Tatoeba-test.eng-fra.eng.fra 45.8 0.641
Tatoeba-test.eng-frm.eng.frm 0.9 0.212
Tatoeba-test.eng-fvr.eng.fvr 2.6 0.260
Tatoeba-test.eng-glg.eng.glg 45.8 0.655
Tatoeba-test.eng-ita.eng.ita 45.9 0.678
Tatoeba-test.eng-lad.eng.lad 8.9 0.324
Tatoeba-test.eng-lij.eng.lij 1.8 0.191
Tatoeba-test.eng-lld.eng.lld 0.5 0.215
Tatoeba-test.eng-lmo.eng.lmo 0.9 0.203
Tatoeba-test.eng.multi 44.1 0.645
Tatoeba-test.eng-mwl.eng.mwl 4.1 0.331
Tatoeba-test.eng-oci.eng.oci 7.8 0.289
Tatoeba-test.eng-osp.eng.osp 10.8 0.382
Tatoeba-test.eng-pms.eng.pms 1.8 0.197
Tatoeba-test.eng-por.eng.por 41.7 0.637
Tatoeba-test.eng-roh.eng.roh 2.8 0.257
Tatoeba-test.eng-ron.eng.ron 41.8 0.640
Tatoeba-test.eng-scn.eng.scn 1.8 0.175
Tatoeba-test.eng-spa.eng.spa 50.3 0.691
Tatoeba-test.eng-vec.eng.vec 3.2 0.251
Tatoeba-test.eng-wln.eng.wln 6.6 0.236

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.7 0.133
Tatoeba-test.eng-ast.eng.ast 17.2 0.415
Tatoeba-test.eng-cat.eng.cat 47.5 0.668
Tatoeba-test.eng-cos.eng.cos 1.8 0.215
Tatoeba-test.eng-egl.eng.egl 0.4 0.087
Tatoeba-test.eng-ext.eng.ext 13.7 0.353
Tatoeba-test.eng-fra.eng.fra 44.1 0.629
Tatoeba-test.eng-frm.eng.frm 0.6 0.196
Tatoeba-test.eng-gcf.eng.gcf 0.9 0.116
Tatoeba-test.eng-glg.eng.glg 43.7 0.640
Tatoeba-test.eng-hat.eng.hat 30.1 0.529
Tatoeba-test.eng-ita.eng.ita 44.8 0.668
Tatoeba-test.eng-lad.eng.lad 7.5 0.301
Tatoeba-test.eng-lij.eng.lij 1.5 0.187
Tatoeba-test.eng-lld.eng.lld 0.8 0.199
Tatoeba-test.eng-lmo.eng.lmo 0.8 0.177
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng.multi 42.3 0.631
Tatoeba-test.eng-mwl.eng.mwl 2.7 0.252
Tatoeba-test.eng-oci.eng.oci 7.3 0.290
Tatoeba-test.eng-pap.eng.pap 43.7 0.627
Tatoeba-test.eng-pms.eng.pms 2.4 0.194
Tatoeba-test.eng-por.eng.por 40.7 0.632
Tatoeba-test.eng-roh.eng.roh 3.5 0.258
Tatoeba-test.eng-ron.eng.ron 40.0 0.628
Tatoeba-test.eng-scn.eng.scn 1.6 0.100
Tatoeba-test.eng-spa.eng.spa 48.7 0.680
Tatoeba-test.eng-vec.eng.vec 1.9 0.166
Tatoeba-test.eng-wln.eng.wln 8.1 0.226

opus-2020-07-20.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-20.zip
  • test set translations: opus-2020-07-20.test.txt
  • test set scores: opus-2020-07-20.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.5 0.132
Tatoeba-test.eng-ast.eng.ast 15.4 0.413
Tatoeba-test.eng-cat.eng.cat 47.8 0.671
Tatoeba-test.eng-cos.eng.cos 3.3 0.293
Tatoeba-test.eng-egl.eng.egl 0.2 0.085
Tatoeba-test.eng-ext.eng.ext 11.7 0.311
Tatoeba-test.eng-fra.eng.fra 44.8 0.633
Tatoeba-test.eng-frm.eng.frm 1.0 0.213
Tatoeba-test.eng-gcf.eng.gcf 0.8 0.119
Tatoeba-test.eng-glg.eng.glg 44.5 0.646
Tatoeba-test.eng-hat.eng.hat 25.5 0.494
Tatoeba-test.eng-ita.eng.ita 45.1 0.673
Tatoeba-test.eng-lad.eng.lad 8.0 0.305
Tatoeba-test.eng-lij.eng.lij 1.5 0.178
Tatoeba-test.eng-lld.eng.lld 0.4 0.171
Tatoeba-test.eng-lmo.eng.lmo 1.5 0.191
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng-msa.eng.msa 31.2 0.548
Tatoeba-test.eng.multi 42.6 0.632
Tatoeba-test.eng-mwl.eng.mwl 3.3 0.288
Tatoeba-test.eng-oci.eng.oci 7.5 0.287
Tatoeba-test.eng-pap.eng.pap 44.8 0.630
Tatoeba-test.eng-pms.eng.pms 2.7 0.198
Tatoeba-test.eng-por.eng.por 41.3 0.635
Tatoeba-test.eng-roh.eng.roh 4.3 0.271
Tatoeba-test.eng-ron.eng.ron 40.6 0.631
Tatoeba-test.eng-scn.eng.scn 1.4 0.173
Tatoeba-test.eng-spa.eng.spa 49.2 0.684
Tatoeba-test.eng-vec.eng.vec 4.8 0.240
Tatoeba-test.eng-wln.eng.wln 5.4 0.233

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 27.3 0.565
newsdiscussdev2015-enfr-engfra.eng.fra 29.9 0.573
newsdiscusstest2015-enfr-engfra.eng.fra 35.2 0.609
newssyscomb2009-engfra.eng.fra 27.8 0.569
newssyscomb2009-engita.eng.ita 29.0 0.590
newssyscomb2009-engspa.eng.spa 29.5 0.567
news-test2008-engfra.eng.fra 25.1 0.538
news-test2008-engspa.eng.spa 27.2 0.547
newstest2009-engfra.eng.fra 26.6 0.557
newstest2009-engita.eng.ita 28.6 0.582
newstest2009-engspa.eng.spa 28.7 0.565
newstest2010-engfra.eng.fra 29.2 0.573
newstest2010-engspa.eng.spa 33.6 0.598
newstest2011-engfra.eng.fra 31.2 0.591
newstest2011-engspa.eng.spa 34.8 0.599
newstest2012-engfra.eng.fra 29.2 0.574
newstest2012-engspa.eng.spa 35.1 0.601
newstest2013-engfra.eng.fra 29.7 0.565
newstest2013-engspa.eng.spa 31.7 0.576
newstest2016-enro-engron.eng.ron 25.9 0.548
Tatoeba-test.eng-arg.eng.arg 1.7 0.131
Tatoeba-test.eng-ast.eng.ast 16.6 0.417
Tatoeba-test.eng-cat.eng.cat 47.6 0.670
Tatoeba-test.eng-cos.eng.cos 3.3 0.284
Tatoeba-test.eng-egl.eng.egl 0.9 0.118
Tatoeba-test.eng-ext.eng.ext 8.7 0.301
Tatoeba-test.eng-fra.eng.fra 44.8 0.633
Tatoeba-test.eng-frm.eng.frm 0.8 0.201
Tatoeba-test.eng-gcf.eng.gcf 0.8 0.117
Tatoeba-test.eng-glg.eng.glg 44.0 0.642
Tatoeba-test.eng-hat.eng.hat 28.8 0.510
Tatoeba-test.eng-ita.eng.ita 45.3 0.674
Tatoeba-test.eng-lad.eng.lad 8.4 0.310
Tatoeba-test.eng-lij.eng.lij 1.4 0.178
Tatoeba-test.eng-lld.eng.lld 0.8 0.220
Tatoeba-test.eng-lmo.eng.lmo 0.9 0.189
Tatoeba-test.eng-mfe.eng.mfe 82.4 0.915
Tatoeba-test.eng-msa.eng.msa 31.3 0.549
Tatoeba-test.eng.multi 42.6 0.633
Tatoeba-test.eng-mwl.eng.mwl 2.9 0.311
Tatoeba-test.eng-oci.eng.oci 7.9 0.292
Tatoeba-test.eng-pap.eng.pap 47.4 0.661
Tatoeba-test.eng-pms.eng.pms 2.5 0.198
Tatoeba-test.eng-por.eng.por 41.4 0.636
Tatoeba-test.eng-roh.eng.roh 3.2 0.259
Tatoeba-test.eng-ron.eng.ron 40.8 0.632
Tatoeba-test.eng-scn.eng.scn 1.8 0.191
Tatoeba-test.eng-spa.eng.spa 49.4 0.685
Tatoeba-test.eng-vec.eng.vec 5.1 0.253
Tatoeba-test.eng-wln.eng.wln 7.1 0.235

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 27.6 0.567
newsdiscussdev2015-enfr-engfra.eng.fra 30.2 0.575
newsdiscusstest2015-enfr-engfra.eng.fra 35.5 0.612
newssyscomb2009-engfra.eng.fra 27.9 0.570
newssyscomb2009-engita.eng.ita 29.3 0.590
newssyscomb2009-engspa.eng.spa 29.6 0.570
news-test2008-engfra.eng.fra 25.2 0.538
news-test2008-engspa.eng.spa 27.3 0.548
newstest2009-engfra.eng.fra 26.9 0.560
newstest2009-engita.eng.ita 28.7 0.583
newstest2009-engspa.eng.spa 29.0 0.568
newstest2010-engfra.eng.fra 29.3 0.574
newstest2010-engspa.eng.spa 34.2 0.601
newstest2011-engfra.eng.fra 31.4 0.592
newstest2011-engspa.eng.spa 35.0 0.599
newstest2012-engfra.eng.fra 29.5 0.576
newstest2012-engspa.eng.spa 35.5 0.603
newstest2013-engfra.eng.fra 29.9 0.567
newstest2013-engspa.eng.spa 32.1 0.578
newstest2016-enro-engron.eng.ron 26.1 0.551
Tatoeba-test.eng-arg.eng.arg 1.4 0.125
Tatoeba-test.eng-ast.eng.ast 17.8 0.406
Tatoeba-test.eng-cat.eng.cat 48.3 0.676
Tatoeba-test.eng-cos.eng.cos 3.2 0.275
Tatoeba-test.eng-egl.eng.egl 0.2 0.084
Tatoeba-test.eng-ext.eng.ext 11.2 0.344
Tatoeba-test.eng-fra.eng.fra 45.3 0.637
Tatoeba-test.eng-frm.eng.frm 1.1 0.221
Tatoeba-test.eng-gcf.eng.gcf 0.6 0.118
Tatoeba-test.eng-glg.eng.glg 44.2 0.645
Tatoeba-test.eng-hat.eng.hat 28.0 0.502
Tatoeba-test.eng-ita.eng.ita 45.6 0.674
Tatoeba-test.eng-lad.eng.lad 8.2 0.322
Tatoeba-test.eng-lij.eng.lij 1.4 0.182
Tatoeba-test.eng-lld.eng.lld 0.8 0.217
Tatoeba-test.eng-lmo.eng.lmo 0.7 0.190
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng-msa.eng.msa 31.1 0.548
Tatoeba-test.eng.multi 42.9 0.636
Tatoeba-test.eng-mwl.eng.mwl 2.1 0.234
Tatoeba-test.eng-oci.eng.oci 7.9 0.297
Tatoeba-test.eng-pap.eng.pap 44.1 0.648
Tatoeba-test.eng-pms.eng.pms 2.1 0.190
Tatoeba-test.eng-por.eng.por 41.8 0.639
Tatoeba-test.eng-roh.eng.roh 3.5 0.261
Tatoeba-test.eng-ron.eng.ron 41.0 0.635
Tatoeba-test.eng-scn.eng.scn 1.7 0.184
Tatoeba-test.eng-spa.eng.spa 50.1 0.689
Tatoeba-test.eng-vec.eng.vec 3.2 0.248
Tatoeba-test.eng-wln.eng.wln 7.2 0.220

opus1m+bt-2021-03-23.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): arg eng
  • target language(s): ast cat cbk cos egl eng ext fra frm gcf glg hat ind ita jak lad lij lld lmo max mfe min mol msa mwl oci osp pap pms pob por roh ron scn spa tmw vec wln zlm zsm
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>fra<< >>spa<< >>roh<< >>zlm_Latn<< >>cos<< >>ext<< >>mfe<< >>scn<< >>lad<< >>mwl<< >>ast<< >>hat<< >>pob<< >>pap<< >>lmo<< >>vec<< >>pms<< >>glg<< >>cat<< >>msa_Latn<< >>wln<< >>ind<< >>ron<< >>por<< >>ita<< >>oci<< >>lij<< >>jak_Latn<< >>eng<< >>min<< >>zlm<< >>mol<< >>cbk_Latn<<
  • download: opus1m+bt-2021-03-23.zip
  • test set translations: opus1m+bt-2021-03-23.test.txt
  • test set scores: opus1m+bt-2021-03-23.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdiscussdev2015-enfr.eng-fra 28.3 0.560 1500 27986 1.000
newsdiscusstest2015-enfr.eng-fra 33.1 0.594 1500 28027 0.992
newssyscomb2009.eng-fra 26.4 0.558 502 12334 0.997
news-test2008.eng-fra 23.9 0.529 2051 52685 0.992
newstest2009.eng-fra 25.6 0.548 2525 69278 0.978
newstest2010.eng-fra 27.8 0.563 2489 66043 0.985
Tatoeba-test.eng-arg 9.3 0.311 105 405 1.000
Tatoeba-test.eng-ast 26.3 0.492 99 720 0.986
Tatoeba-test.eng-cat 45.4 0.654 1631 12342 0.989
Tatoeba-test.eng-cbk 4.8 0.262 1498 10591 0.993
Tatoeba-test.eng-cos 36.2 0.616 5 45 0.907
Tatoeba-test.eng-egl 0.4 0.127 84 438 0.963
Tatoeba-test.eng-ext 4.8 0.337 69 353 1.000
Tatoeba-test.eng-fra 40.8 0.613 10000 80759 0.973
Tatoeba-test.eng-frm 1.0 0.209 18 211 1.000
Tatoeba-test.eng-gcf 0.8 0.121 99 560 0.922
Tatoeba-test.eng-glg 41.9 0.632 1008 7828 0.978
Tatoeba-test.eng-hat 33.7 0.529 64 416 0.951
Tatoeba-test.eng-ita 43.1 0.656 10000 65498 0.952
Tatoeba-test.eng-lad 10.6 0.324 629 3354 1.000
Tatoeba-test.eng-lad_Latn 11.3 0.354 582 3097 1.000
Tatoeba-test.eng-lij 4.6 0.289 94 711 0.973
Tatoeba-test.eng-lld 0.8 0.214 21 228 0.937
Tatoeba-test.eng-lmo 10.5 0.314 17 124 1.000
Tatoeba-test.eng-mfe 83.6 0.898 7 36 1.000
Tatoeba-test.eng-multi 39.7 0.609 10000 73684 0.968
Tatoeba-test.eng-mwl 19.5 0.576 4 21 1.000
Tatoeba-test.eng-oci 10.0 0.332 841 5219 0.914
Tatoeba-test.eng-osp 10.8 0.365 3 20 1.000
Tatoeba-test.eng-pap 52.0 0.699 70 376 1.000
Tatoeba-test.eng-pms 12.6 0.338 268 2244 0.945
Tatoeba-test.eng-por 42.2 0.643 10000 75353 0.969
Tatoeba-test.eng-roh 20.4 0.456 16 198 1.000
Tatoeba-test.eng-ron 34.4 0.590 5000 36833 0.971
Tatoeba-test.eng-scn 42.5 0.531 4 42 1.000
Tatoeba-test.eng-spa 46.5 0.664 10000 77291 0.973
Tatoeba-test.eng-vec 14.1 0.325 19 127 0.839
Tatoeba-test.eng-wln 15.3 0.328 89 520 0.957

opus1m+bt-2021-03-24.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): arg ast cat cbk cos egl ext fra frm gcf glg hat ind ita jak lad lij lld lmo max mfe min mol msa mwl oci osp pap pms pob por roh ron scn spa tmw vec wln zlm zsm
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>acf<< >>aoa<< >>arg<< >>ast<< >>cat<< >>cbk<< >>cbk_Latn<< >>ccd<< >>cks<< >>cos<< >>cri<< >>crs<< >>dlm<< >>drc<< >>egl<< >>ext<< >>fab<< >>fax<< >>fra<< >>frc<< >>frm<< >>frm_Latn<< >>fro<< >>frp<< >>fur<< >>gcf<< >>gcf_Latn<< >>gcr<< >>glg<< >>hat<< >>idb<< >>ind<< >>ist<< >>ita<< >>itk<< >>jak_Latn<< >>kea<< >>kmv<< >>lad<< >>lad_Latn<< >>lij<< >>lld<< >>lld_Latn<< >>lmo<< >>lou<< >>max_Latn<< >>mcm<< >>mfe<< >>min<< >>mol<< >>msa_Latn<< >>mwl<< >>mxi<< >>mzs<< >>nap<< >>nrf<< >>oci<< >>osp<< >>osp_Latn<< >>pap<< >>pcd<< >>pln<< >>pms<< >>pob<< >>por<< >>pov<< >>pre<< >>pro<< >>rcf<< >>rgn<< >>roh<< >>ron<< >>ruo<< >>rup<< >>ruq<< >>scf<< >>scn<< >>sdc<< >>sdn<< >>spa<< >>spq<< >>src<< >>srd<< >>srm<< >>sro<< >>tmg<< >>tmw_Latn<< >>tvy<< >>vec<< >>vkp<< >>wln<< >>xmm<< >>zlm<< >>zlm_Latn<< >>zsm_Latn<<
  • download: opus1m+bt-2021-03-24.zip
  • test set translations: opus1m+bt-2021-03-24.test.txt
  • test set scores: opus1m+bt-2021-03-24.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2016-enro.eng-ron 21.7 0.526 1999 51566 0.970
newsdiscussdev2015-enfr.eng-fra 27.8 0.556 1500 27986 1.000
newsdiscusstest2015-enfr.eng-fra 32.7 0.590 1500 28027 0.997
newssyscomb2009.eng-fra 26.1 0.556 502 12334 0.996
newssyscomb2009.eng-ita 27.8 0.580 502 11551 1.000
newssyscomb2009.eng-spa 29.0 0.566 502 12506 0.981
news-test2008.eng-fra 24.1 0.528 2051 52685 0.993
news-test2008.eng-spa 26.4 0.541 2051 52596 0.995
newstest2009.eng-fra 25.1 0.545 2525 69278 0.976
newstest2009.eng-ita 27.1 0.571 2525 63474 1.000
newstest2009.eng-spa 27.9 0.561 2525 68114 0.998
newstest2010.eng-fra 27.3 0.560 2489 66043 0.986
newstest2010.eng-spa 32.5 0.590 2489 65522 0.993
newstest2011.eng-fra 29.3 0.576 3003 80626 0.966
newstest2011.eng-spa 33.7 0.592 3003 79476 0.978
newstest2012.eng-fra 27.4 0.560 3003 78011 0.981
newstest2012.eng-spa 33.7 0.591 3003 79006 0.960
newstest2013.eng-fra 28.0 0.551 3000 70037 0.968
newstest2013.eng-spa 30.2 0.566 3000 70528 0.948
newstest2016-enro.eng-ron 20.8 0.511 1999 49094 0.984
Tatoeba-test.eng-arg 15.7 0.352 105 405 1.000
Tatoeba-test.eng-ast 25.8 0.490 99 720 0.990
Tatoeba-test.eng-cat 44.7 0.647 1631 12342 0.983
Tatoeba-test.eng-cbk 4.7 0.268 1498 10591 0.911
Tatoeba-test.eng-cos 45.1 0.697 5 45 0.931
Tatoeba-test.eng-egl 0.4 0.070 84 438 0.858
Tatoeba-test.eng-ext 5.0 0.333 69 353 1.000
Tatoeba-test.eng-fra 39.9 0.605 10000 80759 0.971
Tatoeba-test.eng-frm 0.9 0.210 18 211 1.000
Tatoeba-test.eng-gcf 0.7 0.107 99 560 0.986
Tatoeba-test.eng-glg 42.3 0.630 1008 7828 0.981
Tatoeba-test.eng-hat 34.4 0.561 64 416 0.968
Tatoeba-test.eng-ind 33.9 0.598 4289 28294 0.956
Tatoeba-test.eng-ita 42.3 0.650 10000 65498 0.951
Tatoeba-test.eng-lad 10.0 0.311 629 3354 1.000
Tatoeba-test.eng-lad_Latn 10.7 0.340 582 3097 1.000
Tatoeba-test.eng-lij 4.9 0.292 94 711 0.973
Tatoeba-test.eng-lld 0.5 0.204 21 228 0.927
Tatoeba-test.eng-lmo 13.3 0.363 17 124 1.000
Tatoeba-test.eng-max_Latn 3.1 0.124 127 917 0.906
Tatoeba-test.eng-mfe 83.6 0.909 7 36 1.000
Tatoeba-test.eng-min 5.5 0.253 19 147 0.930
Tatoeba-test.eng-msa 28.9 0.528 5000 33629 0.974
Tatoeba-test.eng-multi 39.0 0.607 10000 73122 0.967
Tatoeba-test.eng-mwl 26.9 0.730 4 21 1.000
Tatoeba-test.eng-oci 10.2 0.335 841 5219 0.914
Tatoeba-test.eng-osp 14.6 0.479 3 20 1.000
Tatoeba-test.eng-pap 46.2 0.645 70 376 1.000
Tatoeba-test.eng-pms 12.8 0.347 268 2244 0.942
Tatoeba-test.eng-por 41.6 0.640 10000 75353 0.972
Tatoeba-test.eng-roh 18.1 0.454 16 198 1.000
Tatoeba-test.eng-ron 33.8 0.584 5000 36833 0.971
Tatoeba-test.eng-scn 37.2 0.482 4 42 1.000
Tatoeba-test.eng-spa 45.9 0.661 10000 77291 0.974
Tatoeba-test.eng-tmw_Latn 5.8 0.130 5 23 1.000
Tatoeba-test.eng-vec 17.7 0.326 19 127 0.918
Tatoeba-test.eng-wln 13.9 0.300 89 520 0.949
Tatoeba-test.eng-zlm_Latn 3.0 0.329 24 163 0.975
Tatoeba-test.eng-zsm_Latn 3.1 0.129 536 4085 1.000
tico19-test.eng-fra 33.6 0.590 2100 64655 0.983
tico19-test.eng-pob 41.1 0.685 2100 62729 0.943
tico19-test.eng-por 40.8 0.684 2100 62729 0.967
tico19-test.eng-spa 42.5 0.682 2100 66591 0.949

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): arg ast cat cbk cos egl ext fra frm gcf glg hat ita lad lij lld lmo mfe mol mwl oci osp pap pms pob por roh ron scn spa vec wln
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>acf<< >>aoa<< >>arg<< >>ast<< >>cat<< >>cbk<< >>cbk_Latn<< >>ccd<< >>cks<< >>cos<< >>cri<< >>crs<< >>dlm<< >>drc<< >>egl<< >>ext<< >>fab<< >>fax<< >>fra<< >>frc<< >>frm<< >>frm_Latn<< >>fro<< >>frp<< >>fur<< >>gcf<< >>gcf_Latn<< >>gcr<< >>glg<< >>hat<< >>idb<< >>ist<< >>ita<< >>itk<< >>kea<< >>kmv<< >>lad<< >>lad_Latn<< >>lij<< >>lld<< >>lld_Latn<< >>lmo<< >>lou<< >>mcm<< >>mfe<< >>mol<< >>mwl<< >>mxi<< >>mzs<< >>nap<< >>nrf<< >>oci<< >>osp<< >>osp_Latn<< >>pap<< >>pcd<< >>pln<< >>pms<< >>pob<< >>por<< >>pov<< >>pre<< >>pro<< >>rcf<< >>rgn<< >>roh<< >>ron<< >>ruo<< >>rup<< >>ruq<< >>scf<< >>scn<< >>sdc<< >>sdn<< >>spa<< >>spq<< >>src<< >>srd<< >>sro<< >>tmg<< >>tvy<< >>vec<< >>vkp<< >>wln<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2016-enro.eng-ron 22.4 0.531 1999 51566 0.971
newsdiscussdev2015-enfr.eng-fra 28.4 0.561 1500 27986 1.000
newsdiscusstest2015-enfr.eng-fra 33.3 0.596 1500 28027 0.993
newssyscomb2009.eng-fra 26.6 0.561 502 12334 0.997
newssyscomb2009.eng-ita 28.2 0.580 502 11551 1.000
newssyscomb2009.eng-spa 28.5 0.563 502 12506 0.983
news-test2008.eng-fra 24.0 0.530 2051 52685 0.996
news-test2008.eng-spa 26.6 0.544 2051 52596 0.998
newstest2009.eng-fra 25.7 0.550 2525 69278 0.980
newstest2009.eng-ita 27.6 0.575 2525 63474 1.000
newstest2009.eng-spa 28.2 0.562 2525 68114 0.999
newstest2010.eng-fra 27.6 0.563 2489 66043 0.983
newstest2010.eng-spa 32.8 0.593 2489 65522 0.993
newstest2011.eng-fra 29.9 0.583 3003 80626 0.970
newstest2011.eng-spa 34.2 0.594 3003 79476 0.979
newstest2012.eng-fra 28.0 0.565 3003 78011 0.981
newstest2012.eng-spa 34.1 0.594 3003 79006 0.962
newstest2013.eng-fra 28.3 0.553 3000 70037 0.970
newstest2013.eng-spa 30.8 0.569 3000 70528 0.950
newstest2016-enro.eng-ron 21.4 0.516 1999 49094 0.986
Tatoeba-test.eng-arg 11.0 0.327 105 405 1.000
Tatoeba-test.eng-ast 24.4 0.488 99 720 0.993
Tatoeba-test.eng-cat 46.1 0.659 1631 12342 0.989
Tatoeba-test.eng-cbk 4.7 0.265 1498 10591 0.876
Tatoeba-test.eng-cos 39.1 0.619 5 45 1.000
Tatoeba-test.eng-egl 1.1 0.124 84 438 0.993
Tatoeba-test.eng-ext 5.9 0.315 69 353 1.000
Tatoeba-test.eng-fra 40.9 0.613 10000 80759 0.973
Tatoeba-test.eng-frm 1.0 0.212 18 211 1.000
Tatoeba-test.eng-gcf 0.8 0.121 99 560 0.936
Tatoeba-test.eng-glg 43.5 0.636 1008 7828 0.983
Tatoeba-test.eng-hat 35.0 0.570 64 416 0.963
Tatoeba-test.eng-ita 43.2 0.657 10000 65498 0.954
Tatoeba-test.eng-lad 11.5 0.343 629 3354 1.000
Tatoeba-test.eng-lad_Latn 12.4 0.375 582 3097 1.000
Tatoeba-test.eng-lij 5.1 0.265 94 711 0.941
Tatoeba-test.eng-lld 1.0 0.215 21 228 0.932
Tatoeba-test.eng-lmo 6.9 0.283 17 124 1.000
Tatoeba-test.eng-mfe 83.6 0.909 7 36 1.000
Tatoeba-test.eng-multi 41.6 0.623 10000 74573 0.970
Tatoeba-test.eng-mwl 25.4 0.685 4 21 1.000
Tatoeba-test.eng-oci 9.7 0.330 841 5219 0.913
Tatoeba-test.eng-osp 15.2 0.358 3 20 1.000
Tatoeba-test.eng-pap 45.0 0.655 70 376 1.000
Tatoeba-test.eng-pms 12.4 0.345 268 2244 0.963
Tatoeba-test.eng-por 42.4 0.643 10000 75353 0.971
Tatoeba-test.eng-roh 18.4 0.438 16 198 0.995
Tatoeba-test.eng-ron 34.8 0.589 5000 36833 0.971
Tatoeba-test.eng-scn 35.7 0.470 4 42 1.000
Tatoeba-test.eng-spa 47.0 0.666 10000 77291 0.975
Tatoeba-test.eng-vec 5.2 0.307 19 127 0.960
Tatoeba-test.eng-wln 15.7 0.318 89 520 0.973
tico19-test.eng-fra 34.6 0.597 2100 64655 0.988
tico19-test.eng-pob 42.5 0.691 2100 62729 0.948
tico19-test.eng-por 41.6 0.687 2100 62729 0.962
tico19-test.eng-spa 43.1 0.685 2100 66591 0.952