-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in not-contraction handling code #15
Comments
Looks like a valid bug. |
Thanks sir, you will see the PR in a bit! |
That would be very nice - thank you, Koen! Checking… |
* fix not-contraction offsets + add test * do not differ in offset calculation when using replace_not_contract=True or False
Fixed by PR #17 I will review in full over the weekend, release a new version, and close this bug thereafter. Thanks, again, Koen! |
Fixed by release 1.3.2; Closing. |
Hi,
Thanks for the great library! I think I ran into a weird edge-case wrt not-contraction handling code. If I use the following example:
The output is
[<Token '' : "n't" @ 1>]
. Something is going wrong in the offset calculation there, that 1 should be a 0... The real example this came from is a sentence in the AIDA dataset," Falling share prices in New York do n't hurt Mexico as long as it happens gradually , as earlier this week
.I see the same with "don't":
[<Token '' : 'do' @ 0>, <Token '' : "n't" @ 3>]
, that 3 should be 2 no?Would love to hear your thoughts, not sure how to fix this neatly yet.
The text was updated successfully, but these errors were encountered: