Unexpected output(prediction) for TokenClassification, using pipeline #6514

himanshudce · 2020-08-16T07:23:14Z

I trained the language model from scratch on my language. fine-tuned it but while predicting the results using "pipeline" but, i am not getting a proper tag for each token. it looks like it is not tokenizing the words properly and giving results on subword tokens, i also tried grouped_entities=True, but not working,
my code -

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers import TokenClassificationPipeline

# Named entity recognition pipeline, passing in a specific model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("./sumerianRoBERTo-finetune")
tokenizer = AutoTokenizer.from_pretrained("./sumerianRoBERTo-finetune")
nlp_grouped = TokenClassificationPipeline(
    model=model,
    grouped_entities=True,
    tokenizer=tokenizer,
)
print(nlp_grouped('szu-nigin 1(u) 7(disz) 1/3(disz) gin2 ku3-babbar'))

Results -

[{'entity_group': 'N', 'score': 0.7584937413533529, 'word': '<s>szu-'}, {'entity_group': 'V', 'score': 0.7493271827697754, 'word': 'nigin'}, {'entity_group': 'NU', 'score': 0.9881511330604553, 'word': ' 1'}, {'entity_group': 'N', 'score': 0.8397139310836792, 'word': 'u'}, {'entity_group': 'NU', 'score': 0.7238532304763794, 'word': ') 7'}, {'entity_group': 'N', 'score': 0.6140500903129578, 'word': 'disz)'}, {'entity_group': 'NU', 'score': 0.9929361343383789, 'word': ' 1'}, {'entity_group': 'N', 'score': 0.993495523929596, 'word': '/'}, {'entity_group': 'NU', 'score': 0.9997004270553589, 'word': '3'}, {'entity_group': 'N', 'score': 0.7956433892250061, 'word': 'disz) gin'}, {'entity_group': 'NU', 'score': 0.9885044693946838, 'word': '2'}, {'entity_group': 'NE', 'score': 0.6853057146072388, 'word': ' ku'}, {'entity_group': 'N', 'score': 0.9291318953037262, 'word': '3-'}, {'entity_group': 'AJ', 'score': 0.5223987698554993, 'word': 'babbar'}, {'entity_group': 'N', 'score': 0.8513995409011841, 'word': '</s>'}]

and when grouped_entities=False, I am getting

[{'word': '<s>', 'score': 0.5089993476867676, 'entity': 'N', 'index': 0}, {'word': 'szu', 'score': 0.9983197450637817, 'entity': 'N', 'index': 1}, {'word': '-', 'score': 0.7681621313095093, 'entity': 'N', 'index': 2}, {'word': 'nigin', 'score': 0.7493271827697754, 'entity': 'V', 'index': 3}, {'word': 'Ġ1', 'score': 0.9881511330604553, 'entity': 'NU', 'index': 4}, {'word': 'u', 'score': 0.8397139310836792, 'entity': 'N', 'index': 6}, {'word': ')', 'score': 0.4481121897697449, 'entity': 'NU', 'index': 7}, {'word': 'Ġ7', 'score': 0.9995942711830139, 'entity': 'NU', 'index': 8}, {'word': 'disz', 'score': 0.6592599749565125, 'entity': 'N', 'index': 10}, {'word': ')', 'score': 0.5688402056694031, 'entity': 'N', 'index': 11}, {'word': 'Ġ1', 'score': 0.9929361343383789, 'entity': 'NU', 'index': 12}, {'word': '/', 'score': 0.993495523929596, 'entity': 'N', 'index': 13}, {'word': '3', 'score': 0.9997004270553589, 'entity': 'NU', 'index': 14}, {'word': 'disz', 'score': 0.6896834969520569, 'entity': 'N', 'index': 16}, {'word': ')', 'score': 0.6974959969520569, 'entity': 'N', 'index': 17}, {'word': 'Ġgin', 'score': 0.9997506737709045, 'entity': 'N', 'index': 18}, {'word': '2', 'score': 0.9885044693946838, 'entity': 'NU', 'index': 19}, {'word': 'Ġku', 'score': 0.6853057146072388, 'entity': 'NE', 'index': 20}, {'word': '3', 'score': 0.901140570640564, 'entity': 'N', 'index': 21}, {'word': '-', 'score': 0.9571232199668884, 'entity': 'N', 'index': 22}, {'word': 'babbar', 'score': 0.5223987698554993, 'entity': 'AJ', 'index': 23}, {'word': '</s>', 'score': 0.8513995409011841, 'entity': 'N', 'index': 24}]

while I am just looking for labels for space tokenized tags.

The text was updated successfully, but these errors were encountered:

stale · 2020-10-17T19:59:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

cceyda mentioned this issue Sep 28, 2020

[WIP] Ner pipeline grouped_entities fixes #5970

Merged

3 tasks

stale bot added the wontfix label Oct 17, 2020

stale bot closed this as completed Oct 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected output(prediction) for TokenClassification, using pipeline #6514

Unexpected output(prediction) for TokenClassification, using pipeline #6514

himanshudce commented Aug 16, 2020

stale bot commented Oct 17, 2020

Unexpected output(prediction) for TokenClassification, using pipeline #6514

Unexpected output(prediction) for TokenClassification, using pipeline #6514

Comments

himanshudce commented Aug 16, 2020

stale bot commented Oct 17, 2020