Releases: hipe-eval/HIPE-2022-data
Releases · hipe-eval/HIPE-2022-data
HIPE-2022 data v2.1-test-all-unmasked
- 📄 addition of topres19th test file.
This release contains full test data as used during the HIPE-2022 campaign between April 26 and May 20, 2022.
HIPE-2022 data v2.1-test
- 📄 added test files for all datasets (except topRes19th, which will come soon)
- 📊 updated the notebook with statistics about the test files
- 🐛 manually corrected QIDs (and a few NE tags) in the sonar test file
HIPE-2022 data v2.1-test_allmasked+sonar_hotfix
- 📄 Adding the masked test files for each dataset for evaluation.
- 🐛 Applying a hotfix to sonar development set file (Linking information was in wrong column, empty tokens removed)
All other datasets remain unchanged.
HIPE-2022 data v2.1
Release notes
- 🐛 ajmc: thorough data cleaning (added missing OCR transcriptions, added some missing Wikidata IDs, fixed some erroneous entity types, added some missing mentions)
- 🐛 hipe2020: correction of one label (exactly) in
HIPE-2022-v2.1-hipe2020-train-fr.tsv
file.
All other datasets remain unchanged.
HIPE-2022 data v2.0
Release notes
This release contains:
- 📃 ajmc: full train and dev sets for fr, en, de.
- 📃 ajmc: mappings [OCR-gold transcript] for ajmc entities (see README-ajmc)
- 🐛 newseye: correction of document_id number in metadata line
# hipe2022:document_id =
+ removal of unannotated documents from DE train set (see README-newseye) - 🐛 sonar: thorough revision of NER and NEL annotations + removal of unrevised materials from dev set (see README-sonar.md)
- updated stats in the dedicated notebook
- updated corpus statistics in the dedicated notebook
HIPE-2022 data v1.0
Release notes
This release contains:
- train and dev sets for: hipe2020 (fr, de), newseye (de, fi, fr, sv), letemps (fr), topres19th;
- dev sets for: hipe2020 (en) and sonar (de) (there won’t be train sets);
- sample for ajmc (de, en).
Please refer to the generic and dataset-specific READMEs.