-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to fix the output? [Found too many repeated mentions (> 10) in the response] #286
Comments
I have the similar issues, so have you found a way to slove the problem? |
Did you find some solution? |
No
…On Mon, Aug 2, 2021, 9:27 PM csgomezg0 ***@***.***> wrote:
Did you find some solution?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#286 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGL7TKSQ6QTFUDGHPQBNFEDT23WRNANCNFSM4S7CX6AA>
.
|
Maybe this can help: |
Which language are you trying to train your model on? I had this issue while trying to make a model for french and I realised that the issue came from a bad tokenization. The tokenization produced by spacy didn't match the already-made tokenization of the dev corpus. As a result, many single tokens were considered as multiple tokens and the model was then running several predictions on those single tokens. As a consequence, those tokens ended up grouped in several identical mention spans (hence the repeated mentions comment). |
Hi @Pantalaymon, I try with neuralcoref for train model in language Spanish but isn't work for me, maybe I have a lot of errors, I don't know, then I am trying with other model, coreferee. |
Oh I didn't know that library. I see that it is pretty new. Is it easier to train on a new language than neuralcoref? I I might try it as well to compare. |
@Mak-Ta-Reque facing the same problem. |
Hi Sanullahaq. As I mentioned, it's not a problem with the dataset. The problem comes from the fact that spacy's tokenization does not match the tokenization in the CONLL file. As a consequence some mention boundaries that span over different tokens for spacy end up spanning over the same tokens in the CONLL output.
But honestly, neuralcoref is not really meant to be extensible to other datasets... depending on your use case , as suggested above I would look at coreferee for which I successfully trained on a french model. |
alas!!! btw I appreciate your response. |
🌋 Computing score
Error during the scoring
The text was updated successfully, but these errors were encountered: