Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LDC datasets #2

Open
mjpost opened this issue Oct 27, 2017 · 2 comments
Open

Add LDC datasets #2

mjpost opened this issue Oct 27, 2017 · 2 comments

Comments

@mjpost
Copy link
Owner

mjpost commented Oct 27, 2017

If $LDC were defined, LDC datasets installed locally could be extracted in similar fashion without violating any licenses. However this would require writing an XML parser to handle multiple references since Python 3 doesn't include one.

@ozancaglayan
Copy link
Collaborator

There's this standard library: https://docs.python.org/3/library/xml.etree.elementtree.html

@mjpost
Copy link
Owner Author

mjpost commented Jun 8, 2020

Yeah, I don't know why I thought we'd have to write one. For the Anthology we use lxml.

In my experience the problem with many "community-wrapped" XML files (e.g., WMT) is that they do not parse.

thammegowda referenced this issue in thammegowda/sacrebleu Apr 7, 2021
Macro/Micro F and BLEU with new API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants