Unknown label during NER 'predict' #2147

Mrs-Hudson · 2018-12-06T06:33:14Z

System (please complete the following information):

OS: Linux
Python version: 3.6.7
AllenNLP version: v0.6.2

I am using the allennlp 'evaluate' command on an NER model trained on a subset on the Ontonotes 5.0 dataset. While making predictions on the test set, I encounter an unknown label, 'I-TIME' due to which the model throws and error:
File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/radhikaparik/taskonomy-nlp/allennlp/run.py", line 18, in <module> main(prog="allennlp") File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/__init__.py", line 70, in main args.func(args) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 151, in evaluate_from_args metrics = evaluate(model, instances, iterator, args.cuda_device) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 103, in evaluate for batch in generator_tqdm: File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__ for obj in iterable: File "/home/radhikaparik/taskonomy-nlp/allennlp/data/iterators/data_iterator.py", line 148, in __call__ batch.index_instances(self.vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 156, in index_instances instance.index_fields(vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/instance.py", line 60, in index_fields field.index(vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/sequence_label_field.py", line 88, in index for label in self.labels] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/sequence_label_field.py", line 88, in <listcomp> for label in self.labels] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/vocabulary.py", line 571, in get_token_index return self._token_to_index[namespace][self._oov_token] KeyError: '@@UNKNOWN@@'

I tried discarding the example by removing it from the batch of instances, but that throws the following error:

Traceback (most recent call last): File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/radhikaparik/taskonomy-nlp/allennlp/run.py", line 18, in <module> field_tensors[field].append(tensors) main(prog="allennlp") File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/__init__.py", line 70, in main args.func(args) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 151, in evaluate_from_args metrics = evaluate(model, instances, iterator, args.cuda_device) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 103, in evaluate for batch in generator_tqdm: File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__ for obj in iterable: File "/home/radhikaparik/taskonomy-nlp/allennlp/data/iterators/data_iterator.py", line 150, in __call__ padding_lengths = batch.get_padding_lengths() File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 58, in get_padding_lengths for instance in self.instances] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 58, in <listcomp> for instance in self.instances] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/instance.py", line 69, in get_padding_lengths lengths[field_name] = field.get_padding_lengths() File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/text_field.py", line 76, in get_padding_lengths raise ConfigurationError("You must call .index(vocabulary) on a " allennlp.common.checks.ConfigurationError: 'You must call .index(vocabulary) on a field before determining padding lengths.' [INFO/MainProcess] process shutting down

Is there a standard way to deal with unseen labels during 'evaluate'?

The text was updated successfully, but these errors were encountered:

matt-gardner · 2018-12-06T17:34:30Z

For your use case, what would you like to happen when you encounter an unseen label at test time? It's not clear in general what you should do here, which is why we currently just crash. If you give us some more info on what you want to happen, we can help you figure out how to make it work.

schmmd · 2019-01-17T23:58:35Z

Closing due to inactivity.

jbrry · 2019-07-10T19:17:36Z

For anyone who had a similar issue, e.g. how to deal with a label in the test set that was not in the training set, there is an example of a workaround here: https://github.com/Hyperparticle/udify/blob/b6a1173e7e5fc1e4c63f4a7cf1563b469268a3b8/udify/predictors/predictor.py

matt-gardner · 2019-07-10T19:25:35Z

You can also just add an OOV token to your model very easily; it's just not really clear that you should, and depends on what you're trying to do.

schmmd assigned matt-gardner Dec 7, 2018

matt-gardner added the Waiting For Response label Dec 7, 2018

schmmd closed this as completed Jan 17, 2019

jgroschwitz mentioned this issue Nov 19, 2021

error when encountering unknown NER label coli-saar/am-parser#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown label during NER 'predict' #2147

Unknown label during NER 'predict' #2147

Mrs-Hudson commented Dec 6, 2018

matt-gardner commented Dec 6, 2018

schmmd commented Jan 17, 2019

jbrry commented Jul 10, 2019

matt-gardner commented Jul 10, 2019

Unknown label during NER 'predict' #2147

Unknown label during NER 'predict' #2147

Comments

Mrs-Hudson commented Dec 6, 2018

matt-gardner commented Dec 6, 2018

schmmd commented Jan 17, 2019

jbrry commented Jul 10, 2019

matt-gardner commented Jul 10, 2019