Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which version of CTB to use? #3

Open
yangky11 opened this issue Mar 25, 2020 · 5 comments
Open

Which version of CTB to use? #3

yangky11 opened this issue Mar 25, 2020 · 5 comments

Comments

@yangky11
Copy link

Hi,

The CTB version in the paper is 5.1 but this repo links to CTB 8.0. I'm wondering which version should be used to reproduce the experiments? Thank you.

@hantek
Copy link
Owner

hantek commented Mar 26, 2020

The 8.0 and 5.1 are different versions, but we were not able to find the original 5.1 version from the web. According to [Liu and Zhang. 2017b], who is using 5.1, we found that we were able to find all the train/valid/test sections in our 8.0 version data. So we extracted the corresponding sections in 8.0 to form a 5.1 version out from it, and then discarded the remaining sections that were not used.

You can find which sections were selected and how they were split into train/valid/test splits in ctb.py

Ref:
Jiangming Liu and Yue Zhang. 2017b. Shift-reduce constituent parsing with neural lookahead features. Transactions of the Association for Computational Linguistics 5:45–58.

@yangky11
Copy link
Author

Thanks for your reply. So 5.1 is a subset of 8.0, which can be extracted by ctb.py?

@hantek
Copy link
Owner

hantek commented Mar 27, 2020 via email

@yangky11
Copy link
Author

Thank you for the clarification!

@yangky11
Copy link
Author

Hi Zhouhan,

I followed the instructions to extract CTB data. I got 17,544 training examples, 352 validation examples and 348 testing examples. Is that correct? It feels to me the number of validation/testing examples are too few; so I'm wondering if something in my preprocessing has gone wrong.

Thanks!

@yangky11 yangky11 reopened this May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants