-
Notifications
You must be signed in to change notification settings - Fork 2.2k
structured-prediction-constituency-parser adds extra spaces #5673
Comments
Thanks for the bug report @LawlAoux! I have to admit I don't have a lot of context here, and this model was implemented before my time. Does this bug break something in your own code? I'm just trying to get a sense of how big of an issue this is. |
@epwalsh Thanks for your reply! It does break something in our code, because for example when you have something like ob-gyn, which is a common specialty, you get ob - gyn, and the parser tree is just wrong.. In addition, in the example I provider in the bug report you can see that it adds extra spaces whenever there is a special character, which is not a desired behaviour for something like chat bot.. (You can argue that we can just remove these spaces, but sometimes we want to retain the original spaces which will be stripped if we do that, like in tts for example) |
Hmm I see, thanks. I just took a deeper look at this. Here's what I found: The root of the issue is that the predictor uses this Spacy tokenizer which discards spaces, unlike more "modern" tokenizers such as GPT2's BPE tokenizer. So then this line here naively joins all of the tokens with spaces to reconstruct each span. Now, the Spacy I think this change is doable. We'd have to modify code in several places:
I might be missing some details, but I think that's the gist of it. I'm putting the "Contributions Welcome" label on this because I probably won't have time to tackle this anytime soon, but I'm happy to review a PR and help where I can. |
Hi @borosilicate, yes please go ahead with a PR when you have a chance |
Checklist
main
branch of AllenNLP.pip freeze
.Description
The consistency parser adds white spaces before and after special characters like ".?-," and etc.
For example, for the sentence: "Hi there, I'm LawlAoux." the output for the root is "Hi there , I 'm LawlAoux ."(full tree in details).
Related issues or possible duplicates
Environment
OS: OS X
Python version: 3.9.7
Output of
pip freeze
:Steps to reproduce
Example source:
The text was updated successfully, but these errors were encountered: