This bundle contains 723 passages from the reviews section of the English Web Treebank (LDC2012T13), annotated according to the foundational layer of UCCA, v2.0. The passages are given as xmls in the UCCA format. This corpus contains 55590 tokens over 3813 sentences, as tokenized and split according to the Universal Dependencies English Web Treebank.
The annotation was conducted at the Hebrew University of Jerusalem. If you use this corpus, please cite:
@inproceedings{hershcovich2019content,
title = "Content Differences in Syntactic and Semantic Representation",
author = "Hershcovich, Daniel and
Abend, Omri and
Rappoport, Ari",
booktitle = "Proc. of NAACL-HLT",
url = "https://www.aclweb.org/anthology/N19-1047",
pages = "478--488"
}
- The passages files in XML format. file names are of the form
XXX.xml
where XXX is the passage ID. Please see the UCCA resource webpage for a software package for reading and using these files. scripts/get_ud.sh
: script to download all UD-annotated sentences corresponding to the UCCA passages in this corpus, and split the UCCA passages according to the UD sentences. The split UD files are saved inud
, and the split UCCA files insentences_by_ud
.