Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add POS tagging and Phrase chunking token classification examples #6457

Merged
merged 3 commits into from
Aug 13, 2020

Conversation

vblagoje
Copy link
Contributor

This PR adds POS tagging and Phrase chunking examples to token classification examples. The current example (NER) is minimally adjusted to allow users to experiment with their token classification model training easily. Although experimenting with token classifications other than NER token classification is already possible for skilled developers, this PR lowers the barrier to entry even further and demonstrates HF extensibility.

The adjustments made consist of:

  • extracting TokenClassificationTask superclass
  • implementing the specific task particulars (reading of InputExample etc.) in task subclasses
  • "dynamic loading" of a task subclass depending on the token classification task trained

I also noticed that:

  • NER dataset used is unavailable and should be replaced. I didn't replace it in this PR
  • PL training needs to be slightly retrofitted to adjust for the latest PL's BaseTransformer master changes. I made the change to make sure my changes work for these new examples

If you think adding one rather than two token task classification example is enough (say POS tagging) let me know - I'll remove the other. Also, please let me know if any additional adjustments are needed.

* POS tagging example

* Phrase chunking example
@julien-c julien-c requested review from sgugger and stefan-it August 13, 2020 09:20
@stefan-it
Copy link
Collaborator

Hi @vblagoje , thanks for adding this 👍

GermEval dataset is currently not available - it seems that they've relaunched the shared task website. This dataset removal will also affect libraries such as Flair or nlp so I will try to find another mirror, thanks for reporting it!

For PoS tagging it would be awesome if you could also report/output accuracy after training - just import accuracy_score from the seqeval package :)

@vblagoje
Copy link
Contributor Author

Thanks for the review @stefan-it Let me know if there are any additional suggestions. Perhaps we can add appropriate URLs for the GermEval dataset and remove the chunking example if needed.

@sgugger
Copy link
Collaborator

sgugger commented Aug 13, 2020

This looks great, thanks! Note that there is a big rework of the examples to use the nlp library and Trainer in the pipeline. We're polishing the APIs before we start converting every script. I'll tag you when we get to this one to make sure we don't break anything.

In the meantime, could you take care of the styling issue so we can merge?

@vblagoje
Copy link
Contributor Author

Ok @sgugger please do ping me and I'll make sure that all token classification examples work as expected, perhpas I can help with the transition. I am not sure why CI fails for styling, more specifically isort ERROR: examples/token-classification/tasks.py Imports are incorrectly sorted. It passes both on my working laptop and training machine. Could you please tell me how imports are incorrectly sorted in tasks.py ?

@sgugger
Copy link
Collaborator

sgugger commented Aug 13, 2020

It may be because of the dep you're adding to examples. It should probably be added in the known_third_party list here.

@vblagoje
Copy link
Contributor Author

Ok @sgugger check_code_quality passes now, but there are other new failures. On a first look, they seem transient/unrelated to this PR?

@sgugger
Copy link
Collaborator

sgugger commented Aug 13, 2020

Looks flaky, re-triggered the CI

@sgugger sgugger merged commit eda07ef into huggingface:master Aug 13, 2020
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants