How to fix the output? [Found too many repeated mentions (> 10) in the response] #286

Mak-Ta-Reque · 2020-10-26T08:58:05Z

🌋 Computing score

Error during the scoring

Command '['perl', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/scorer_wrapper.pl', 'muc', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/data/key.txt', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/test_mentions.txt']' returned non-zero exit status 1.

Found too many repeated mentions (> 10) in the response, so refusing to score. Please fix the output.

version: 8.01 /Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/scorer/lib/CorScorer.pm

Repeated mention in the response: 116, 121 1818

Repeated mention in the response: 1065, 1066 136136

Repeated mention in the response: 825, 825 152152

Repeated mention in the response: 92, 94 3333

Repeated mention in the response: 169, 169 4747

Repeated mention in the response: 26, 26 4242

Repeated mention in the response: 26, 26 4242

Repeated mention in the response: 26, 26 4242

Repeated mention in the response: 66, 68 1717

Repeated mention in the response: 254, 254 8888

Repeated mention in the response: 268, 268 9090

Traceback (most recent call last):

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main

    "__main__", mod_spec)

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code

    exec(code, run_globals)

  File "/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/learn.py", line 565, in <module>

    run_model(args)

  File "/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/learn.py", line 175, in run_model

    eval_evaluator.test_model()

  File "/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/evaluator.py", line 180, in test_model

    self.get_score(file_path=ALL_MENTIONS_PATH)

  File "/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/evaluator.py", line 292, in get_score

    encoding="utf-8",

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/subprocess.py", line 395, in check_output

    **kwargs).stdout

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/subprocess.py", line 487, in run

    output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command '['perl', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/scorer_wrapper.pl', 'muc', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/data/key.txt', '/Users/mak/PycharmProjects/tradr_language_tool/neuralcoref/neuralcoref/train/test_mentions.txt']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

LuxuriantHuang · 2021-07-09T02:17:30Z

I have the similar issues, so have you found a way to slove the problem?

csgomezg0 · 2021-08-02T19:27:07Z

Did you find some solution?

Mak-Ta-Reque · 2021-08-03T05:55:02Z

No

…

On Mon, Aug 2, 2021, 9:27 PM csgomezg0 ***@***.***> wrote: Did you find some solution? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#286 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGL7TKSQ6QTFUDGHPQBNFEDT23WRNANCNFSM4S7CX6AA> .

csgomezg0 · 2021-08-04T13:56:15Z

Maybe this can help:
In the folder train, there is a folder called scorer and a folder called lib, in the file CorScorer.pm line 384 change the number 10 for a bigger number, maybe 1000000 or other.
This solution maybe is not correct but it's work for the training. if someone know other solution can correct me, thanks.

Pantalaymon · 2021-09-07T16:29:40Z

Which language are you trying to train your model on?

I had this issue while trying to make a model for french and I realised that the issue came from a bad tokenization. The tokenization produced by spacy didn't match the already-made tokenization of the dev corpus.

As a result, many single tokens were considered as multiple tokens and the model was then running several predictions on those single tokens. As a consequence, those tokens ended up grouped in several identical mention spans (hence the repeated mentions comment).

csgomezg0 · 2021-09-07T16:56:23Z

Hi @Pantalaymon, I try with neuralcoref for train model in language Spanish but isn't work for me, maybe I have a lot of errors, I don't know, then I am trying with other model, coreferee.

Pantalaymon · 2021-09-08T13:19:01Z

Hi @Pantalaymon, I try with neuralcoref for train model in language Spanish but isn't work for me, maybe I have a lot of errors, I don't know, then I am trying with other model, coreferee.

Oh I didn't know that library. I see that it is pretty new. Is it easier to train on a new language than neuralcoref? I I might try it as well to compare.

sanaullahaq · 2022-01-31T06:48:06Z

@Mak-Ta-Reque facing the same problem.
Did you find any solution?
From where you have downloaded the dataset?
I have from this repo https://github.com/clab/att-coref/tree/master/data/conll-2012
I don't know is there any problem with my downloaded dataset?

Pantalaymon · 2022-01-31T15:58:14Z

@Mak-Ta-Reque facing the same problem. Did you find any solution? From where you have downloaded the dataset? I have from this repo https://github.com/clab/att-coref/tree/master/data/conll-2012 I don't know is there any problem with my downloaded dataset?

Hi Sanullahaq. As I mentioned, it's not a problem with the dataset. The problem comes from the fact that spacy's tokenization does not match the tokenization in the CONLL file. As a consequence some mention boundaries that span over different tokens for spacy end up spanning over the same tokens in the CONLL output.
To fix this you'll need to either :

change the pipeline of the spacy model so you pass directly pretokenized data (following conll tokens)
simply remove duplicates in the produced conlls

But honestly, neuralcoref is not really meant to be extensible to other datasets... depending on your use case , as suggested above I would look at coreferee for which I successfully trained on a french model.

sanaullahaq · 2022-02-02T19:37:41Z

alas!!! btw I appreciate your response.
would you like to give me any clue from where I can find pre-tokenized data?

svlandeg added the usage label Nov 18, 2020

sanaullahaq mentioned this issue Jan 31, 2022

regarding datset .conll format #331

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fix the output? [Found too many repeated mentions (> 10) in the response] #286

How to fix the output? [Found too many repeated mentions (> 10) in the response] #286

Mak-Ta-Reque commented Oct 26, 2020 •

edited by svlandeg

Loading

LuxuriantHuang commented Jul 9, 2021

csgomezg0 commented Aug 2, 2021

Mak-Ta-Reque commented Aug 3, 2021 via email

csgomezg0 commented Aug 4, 2021

Pantalaymon commented Sep 7, 2021

csgomezg0 commented Sep 7, 2021

Pantalaymon commented Sep 8, 2021

sanaullahaq commented Jan 31, 2022

Pantalaymon commented Jan 31, 2022 •

edited

Loading

sanaullahaq commented Feb 2, 2022

How to fix the output? [Found too many repeated mentions (> 10) in the response] #286

How to fix the output? [Found too many repeated mentions (> 10) in the response] #286

Comments

Mak-Ta-Reque commented Oct 26, 2020 • edited by svlandeg Loading

LuxuriantHuang commented Jul 9, 2021

csgomezg0 commented Aug 2, 2021

Mak-Ta-Reque commented Aug 3, 2021 via email

csgomezg0 commented Aug 4, 2021

Pantalaymon commented Sep 7, 2021

csgomezg0 commented Sep 7, 2021

Pantalaymon commented Sep 8, 2021

sanaullahaq commented Jan 31, 2022

Pantalaymon commented Jan 31, 2022 • edited Loading

sanaullahaq commented Feb 2, 2022

Mak-Ta-Reque commented Oct 26, 2020 •

edited by svlandeg

Loading

Pantalaymon commented Jan 31, 2022 •

edited

Loading