Skip to content

Releases: hipe-eval/HIPE-2022-data

HIPE-2022 data v2.1-test-all-unmasked

20 May 09:11
43185e4
Compare
Choose a tag to compare
  • 📄 addition of topres19th test file.

This release contains full test data as used during the HIPE-2022 campaign between April 26 and May 20, 2022.

HIPE-2022 data v2.1-test

13 May 15:02
Compare
Choose a tag to compare
  • 📄 added test files for all datasets (except topRes19th, which will come soon)
  • 📊 updated the notebook with statistics about the test files
  • 🐛 manually corrected QIDs (and a few NE tags) in the sonar test file

HIPE-2022 data v2.1-test_allmasked+sonar_hotfix

26 Apr 08:59
186020f
Compare
Choose a tag to compare
  • 📄 Adding the masked test files for each dataset for evaluation.
  • 🐛 Applying a hotfix to sonar development set file (Linking information was in wrong column, empty tokens removed)

All other datasets remain unchanged.

HIPE-2022 data v2.1

15 Apr 10:03
91b1992
Compare
Choose a tag to compare

Release notes

  • 🐛 ajmc: thorough data cleaning (added missing OCR transcriptions, added some missing Wikidata IDs, fixed some erroneous entity types, added some missing mentions)
  • 🐛 hipe2020: correction of one label (exactly) in HIPE-2022-v2.1-hipe2020-train-fr.tsv file.

All other datasets remain unchanged.

HIPE-2022 data v2.0

22 Mar 10:45
165773f
Compare
Choose a tag to compare

Release notes

This release contains:

  • 📃 ajmc: full train and dev sets for fr, en, de.
  • 📃 ajmc: mappings [OCR-gold transcript] for ajmc entities (see README-ajmc)
  • 🐛 newseye: correction of document_id number in metadata line # hipe2022:document_id = + removal of unannotated documents from DE train set (see README-newseye)
  • 🐛 sonar: thorough revision of NER and NEL annotations + removal of unrevised materials from dev set (see README-sonar.md)
  • updated stats in the dedicated notebook
  • updated corpus statistics in the dedicated notebook

HIPE-2022 data v1.0

15 Feb 11:25
88df3ee
Compare
Choose a tag to compare

Release notes

This release contains:

  • train and dev sets for: hipe2020 (fr, de), newseye (de, fi, fr, sv), letemps (fr), topres19th;
  • dev sets for: hipe2020 (en) and sonar (de) (there won’t be train sets);
  • sample for ajmc (de, en).

Please refer to the generic and dataset-specific READMEs.