Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentence_starts is empty list and the first element of sentence_starts is not 0. #2

Open
changzhisun opened this issue Oct 14, 2020 · 1 comment

Comments

@changzhisun
Copy link

changzhisun commented Oct 14, 2020

I found that “sentence_starts” in the qed-train.jsonlines is empty for some document. Is this a annotation error ?
And In some document, the first element of sentence_starts is not 0. Is the previous text not required?

@changzhisun changzhisun changed the title sentence_starts is empty list sentence_starts is empty list and the first element of sentence_starts is not 0. Oct 14, 2020
@calberti
Copy link
Collaborator

Thanks, we should fix this. The sentence boundaries were generated by an older model and are noisy, but for consistency we should always have 0 as a sentence start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants