Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce sentence boundaries for additional root tokens #346

Merged
merged 5 commits into from
Apr 25, 2016
Merged

introduce sentence boundaries for additional root tokens #346

merged 5 commits into from
Apr 25, 2016

Conversation

wbwseeker
Copy link

added the one line that @honnibal suggested to fix issue #322

@wbwseeker
Copy link
Author

wbwseeker commented Apr 25, 2016

with this bugfix, the parser should not get into a loop again.

There is one issue with this way of introducing additional root nodes, and this is serious:
if we have a parse like this: heads [0,-1,-2,-3], deps [ROOT, ROOT, nsubj, nsubj]
arc_eager.finalize_state() will make the second token it's own root and therefore sentence
by changing it to [0,1,-2,-3] with [ROOT, ROOT, nsubj, nsubj]

This is a non-projective structure and the sentence boundary at the second token will
disconnect tokens 2 and 3 from their head 0. This will not be a valid state, but there is currently no mechanism to prevent this. The only reason why this is probably not showing currently is because the statistical model will rarely predict such a structure for proper input.
However, this is something we shouldn't rely on because the model could be used to parse unfamiliar input and will then break on it.

@honnibal
Copy link
Member

That structure is familiar --- I thought about that when doing the joint segmentation and parsing for speech. I thought I had constraints which prevented this, but obviously not. I'll have a look at this.

@honnibal
Copy link
Member

Definitely no constraints.

I'll start digging through the history to get a sense of what I was thinking and what options I tried.

@wbwseeker
Copy link
Author

cool. we should probably merge the pull request already anyway, because the loop bug is currently on the master branch

@honnibal honnibal merged commit feb65fc into explosion:master Apr 25, 2016
@wbwseeker wbwseeker deleted the sentbnd_bug branch May 3, 2016 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants