Add POS tagging and Phrase chunking token classification examples #6457

vblagoje · 2020-08-13T09:17:44Z

This PR adds POS tagging and Phrase chunking examples to token classification examples. The current example (NER) is minimally adjusted to allow users to experiment with their token classification model training easily. Although experimenting with token classifications other than NER token classification is already possible for skilled developers, this PR lowers the barrier to entry even further and demonstrates HF extensibility.

The adjustments made consist of:

extracting TokenClassificationTask superclass
implementing the specific task particulars (reading of InputExample etc.) in task subclasses
"dynamic loading" of a task subclass depending on the token classification task trained

I also noticed that:

NER dataset used is unavailable and should be replaced. I didn't replace it in this PR
PL training needs to be slightly retrofitted to adjust for the latest PL's BaseTransformer master changes. I made the change to make sure my changes work for these new examples

If you think adding one rather than two token task classification example is enough (say POS tagging) let me know - I'll remove the other. Also, please let me know if any additional adjustments are needed.

* POS tagging example * Phrase chunking example

stefan-it · 2020-08-13T09:54:49Z

Hi @vblagoje , thanks for adding this 👍

GermEval dataset is currently not available - it seems that they've relaunched the shared task website. This dataset removal will also affect libraries such as Flair or nlp so I will try to find another mirror, thanks for reporting it!

For PoS tagging it would be awesome if you could also report/output accuracy after training - just import accuracy_score from the seqeval package :)

vblagoje · 2020-08-13T12:09:55Z

Thanks for the review @stefan-it Let me know if there are any additional suggestions. Perhaps we can add appropriate URLs for the GermEval dataset and remove the chunking example if needed.

sgugger · 2020-08-13T12:59:07Z

This looks great, thanks! Note that there is a big rework of the examples to use the nlp library and Trainer in the pipeline. We're polishing the APIs before we start converting every script. I'll tag you when we get to this one to make sure we don't break anything.

In the meantime, could you take care of the styling issue so we can merge?

vblagoje · 2020-08-13T14:02:30Z

Ok @sgugger please do ping me and I'll make sure that all token classification examples work as expected, perhpas I can help with the transition. I am not sure why CI fails for styling, more specifically isort ERROR: examples/token-classification/tasks.py Imports are incorrectly sorted. It passes both on my working laptop and training machine. Could you please tell me how imports are incorrectly sorted in tasks.py ?

sgugger · 2020-08-13T14:09:28Z

It may be because of the dep you're adding to examples. It should probably be added in the known_third_party list here.

vblagoje · 2020-08-13T14:48:40Z

Ok @sgugger check_code_quality passes now, but there are other new failures. On a first look, they seem transient/unrelated to this PR?

sgugger · 2020-08-13T14:55:11Z

Looks flaky, re-triggered the CI

…ples (huggingface#6457)" This reverts commit f4cd971.

Add more token classification examples

6caa8c1

* POS tagging example * Phrase chunking example

julien-c requested review from sgugger and stefan-it August 13, 2020 09:20

PR review fixes

ddeee37

Add conllu to third party list (used in token classification examples)

ce5906b

sgugger merged commit eda07ef into huggingface:master Aug 13, 2020

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Add POS tagging and Phrase chunking token classification exam…

87d84e6

…ples (huggingface#6457)" This reverts commit f4cd971.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add POS tagging and Phrase chunking token classification examples #6457

Add POS tagging and Phrase chunking token classification examples #6457

vblagoje commented Aug 13, 2020

stefan-it commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020

Add POS tagging and Phrase chunking token classification examples #6457

Add POS tagging and Phrase chunking token classification examples #6457

Conversation

vblagoje commented Aug 13, 2020

stefan-it commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020

vblagoje commented Aug 13, 2020

sgugger commented Aug 13, 2020