Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on sentence splitting #68

Closed
marcelo-amancio opened this issue Jun 15, 2015 · 2 comments
Closed

Issue on sentence splitting #68

marcelo-amancio opened this issue Jun 15, 2015 · 2 comments

Comments

@marcelo-amancio
Copy link

I had an issue when splitting a paragraph to sentence using spaCy:

before:
"The new site amounts to a modest tweak to the existing U.S. approach in Iraq, and illustrates Obama's reluctance to escalate the fight and reintroduce U.S. soldiers into combat that he had vowed to bring to an end."

after:
"The new site amounts to a modest tweak to the existing U.S. approach in Iraq, and illustrates Obama's reluctance to escalate the fight and reintroduce U.S. soldiers into combat that he had"
"vowed to bring to an end."

The paragraph should not have been split.

@honnibal
Copy link
Member

Version 0.86 released.

Your example is now parsed as a single sentence. Here's the parse:

The det site
new amod site
site nsubj amounts
amounts ROOT amounts
to prep amounts
a det tweak
modest amod tweak
tweak pobj to
to prep tweak
the det approach
existing amod approach
U.S. compound approach
approach pobj to
in prep approach
Iraq pobj in
, punct amounts
and cc amounts
illustrates conj amounts
Obama poss reluctance
's case Obama
reluctance dobj illustrates
to aux escalate
escalate acl reluctance
the det fight
fight dobj escalate
and cc escalate
reintroduce conj escalate
U.S. compound soldiers
soldiers dobj reintroduce
into prep reintroduce
combat pobj into
that mark vowed
he nsubj vowed
had aux vowed
vowed ccomp illustrates
to aux bring
bring xcomp vowed
to prep bring
an det end
end pobj to
. punct amounts

The sentence is parsed correctly up until the final relative clause (the "that he had vowed to bring to an end"). This parse error was what was causing the previous sentence boundary failure, since the sentence boundaries are inferred from the syntactic structure.

I've now implemented the technique from this paper: http://www.aclweb.org/anthology/J14-2002 (with a novel twist that I've written up, and is under review).

Please try out the new version (being sure to redownload the model), and report prominent failures you come across.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants