Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected output(prediction) for TokenClassification, using pipeline #6514

Closed
himanshudce opened this issue Aug 16, 2020 · 1 comment
Closed
Labels

Comments

@himanshudce
Copy link

I trained the language model from scratch on my language. fine-tuned it but while predicting the results using "pipeline" but, i am not getting a proper tag for each token. it looks like it is not tokenizing the words properly and giving results on subword tokens, i also tried grouped_entities=True, but not working,
my code -

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers import TokenClassificationPipeline

# Named entity recognition pipeline, passing in a specific model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("./sumerianRoBERTo-finetune")
tokenizer = AutoTokenizer.from_pretrained("./sumerianRoBERTo-finetune")
nlp_grouped = TokenClassificationPipeline(
    model=model,
    grouped_entities=True,
    tokenizer=tokenizer,
)
print(nlp_grouped('szu-nigin 1(u) 7(disz) 1/3(disz) gin2 ku3-babbar'))

Results -

[{'entity_group': 'N', 'score': 0.7584937413533529, 'word': '<s>szu-'}, {'entity_group': 'V', 'score': 0.7493271827697754, 'word': 'nigin'}, {'entity_group': 'NU', 'score': 0.9881511330604553, 'word': ' 1'}, {'entity_group': 'N', 'score': 0.8397139310836792, 'word': 'u'}, {'entity_group': 'NU', 'score': 0.7238532304763794, 'word': ') 7'}, {'entity_group': 'N', 'score': 0.6140500903129578, 'word': 'disz)'}, {'entity_group': 'NU', 'score': 0.9929361343383789, 'word': ' 1'}, {'entity_group': 'N', 'score': 0.993495523929596, 'word': '/'}, {'entity_group': 'NU', 'score': 0.9997004270553589, 'word': '3'}, {'entity_group': 'N', 'score': 0.7956433892250061, 'word': 'disz) gin'}, {'entity_group': 'NU', 'score': 0.9885044693946838, 'word': '2'}, {'entity_group': 'NE', 'score': 0.6853057146072388, 'word': ' ku'}, {'entity_group': 'N', 'score': 0.9291318953037262, 'word': '3-'}, {'entity_group': 'AJ', 'score': 0.5223987698554993, 'word': 'babbar'}, {'entity_group': 'N', 'score': 0.8513995409011841, 'word': '</s>'}]

and when grouped_entities=False, I am getting

[{'word': '<s>', 'score': 0.5089993476867676, 'entity': 'N', 'index': 0}, {'word': 'szu', 'score': 0.9983197450637817, 'entity': 'N', 'index': 1}, {'word': '-', 'score': 0.7681621313095093, 'entity': 'N', 'index': 2}, {'word': 'nigin', 'score': 0.7493271827697754, 'entity': 'V', 'index': 3}, {'word': 'Ġ1', 'score': 0.9881511330604553, 'entity': 'NU', 'index': 4}, {'word': 'u', 'score': 0.8397139310836792, 'entity': 'N', 'index': 6}, {'word': ')', 'score': 0.4481121897697449, 'entity': 'NU', 'index': 7}, {'word': 'Ġ7', 'score': 0.9995942711830139, 'entity': 'NU', 'index': 8}, {'word': 'disz', 'score': 0.6592599749565125, 'entity': 'N', 'index': 10}, {'word': ')', 'score': 0.5688402056694031, 'entity': 'N', 'index': 11}, {'word': 'Ġ1', 'score': 0.9929361343383789, 'entity': 'NU', 'index': 12}, {'word': '/', 'score': 0.993495523929596, 'entity': 'N', 'index': 13}, {'word': '3', 'score': 0.9997004270553589, 'entity': 'NU', 'index': 14}, {'word': 'disz', 'score': 0.6896834969520569, 'entity': 'N', 'index': 16}, {'word': ')', 'score': 0.6974959969520569, 'entity': 'N', 'index': 17}, {'word': 'Ġgin', 'score': 0.9997506737709045, 'entity': 'N', 'index': 18}, {'word': '2', 'score': 0.9885044693946838, 'entity': 'NU', 'index': 19}, {'word': 'Ġku', 'score': 0.6853057146072388, 'entity': 'NE', 'index': 20}, {'word': '3', 'score': 0.901140570640564, 'entity': 'N', 'index': 21}, {'word': '-', 'score': 0.9571232199668884, 'entity': 'N', 'index': 22}, {'word': 'babbar', 'score': 0.5223987698554993, 'entity': 'AJ', 'index': 23}, {'word': '</s>', 'score': 0.8513995409011841, 'entity': 'N', 'index': 24}]

while I am just looking for labels for space tokenized tags.

@stale
Copy link

stale bot commented Oct 17, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 17, 2020
@stale stale bot closed this as completed Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant