Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Unknown label during NER 'predict' #2147

Closed
Mrs-Hudson opened this issue Dec 6, 2018 · 4 comments
Closed

Unknown label during NER 'predict' #2147

Mrs-Hudson opened this issue Dec 6, 2018 · 4 comments
Assignees

Comments

@Mrs-Hudson
Copy link

System (please complete the following information):

  • OS: Linux
  • Python version: 3.6.7
  • AllenNLP version: v0.6.2

I am using the allennlp 'evaluate' command on an NER model trained on a subset on the Ontonotes 5.0 dataset. While making predictions on the test set, I encounter an unknown label, 'I-TIME' due to which the model throws and error:
File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/radhikaparik/taskonomy-nlp/allennlp/run.py", line 18, in <module> main(prog="allennlp") File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/__init__.py", line 70, in main args.func(args) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 151, in evaluate_from_args metrics = evaluate(model, instances, iterator, args.cuda_device) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 103, in evaluate for batch in generator_tqdm: File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__ for obj in iterable: File "/home/radhikaparik/taskonomy-nlp/allennlp/data/iterators/data_iterator.py", line 148, in __call__ batch.index_instances(self.vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 156, in index_instances instance.index_fields(vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/instance.py", line 60, in index_fields field.index(vocab) File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/sequence_label_field.py", line 88, in index for label in self.labels] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/sequence_label_field.py", line 88, in <listcomp> for label in self.labels] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/vocabulary.py", line 571, in get_token_index return self._token_to_index[namespace][self._oov_token] KeyError: '@@UNKNOWN@@'

I tried discarding the example by removing it from the batch of instances, but that throws the following error:

Traceback (most recent call last): File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/radhikaparik/taskonomy-nlp/allennlp/run.py", line 18, in <module> field_tensors[field].append(tensors) main(prog="allennlp") File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/__init__.py", line 70, in main args.func(args) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 151, in evaluate_from_args metrics = evaluate(model, instances, iterator, args.cuda_device) File "/home/radhikaparik/taskonomy-nlp/allennlp/commands/evaluate.py", line 103, in evaluate for batch in generator_tqdm: File "/home/radhikaparik/.conda/envs/allennlp/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__ for obj in iterable: File "/home/radhikaparik/taskonomy-nlp/allennlp/data/iterators/data_iterator.py", line 150, in __call__ padding_lengths = batch.get_padding_lengths() File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 58, in get_padding_lengths for instance in self.instances] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/dataset.py", line 58, in <listcomp> for instance in self.instances] File "/home/radhikaparik/taskonomy-nlp/allennlp/data/instance.py", line 69, in get_padding_lengths lengths[field_name] = field.get_padding_lengths() File "/home/radhikaparik/taskonomy-nlp/allennlp/data/fields/text_field.py", line 76, in get_padding_lengths raise ConfigurationError("You must call .index(vocabulary) on a " allennlp.common.checks.ConfigurationError: 'You must call .index(vocabulary) on a field before determining padding lengths.' [INFO/MainProcess] process shutting down

Is there a standard way to deal with unseen labels during 'evaluate'?

@matt-gardner
Copy link
Contributor

For your use case, what would you like to happen when you encounter an unseen label at test time? It's not clear in general what you should do here, which is why we currently just crash. If you give us some more info on what you want to happen, we can help you figure out how to make it work.

@schmmd
Copy link
Member

schmmd commented Jan 17, 2019

Closing due to inactivity.

@schmmd schmmd closed this as completed Jan 17, 2019
@jbrry
Copy link
Contributor

jbrry commented Jul 10, 2019

For anyone who had a similar issue, e.g. how to deal with a label in the test set that was not in the training set, there is an example of a workaround here: https://github.com/Hyperparticle/udify/blob/b6a1173e7e5fc1e4c63f4a7cf1563b469268a3b8/udify/predictors/predictor.py

@matt-gardner
Copy link
Contributor

You can also just add an OOV token to your model very easily; it's just not really clear that you should, and depends on what you're trying to do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants