Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new token classification example #8340

Merged
merged 13 commits into from
Nov 9, 2020
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ git checkout tags/v3.4.0
|---|---|:---:|:---:|:---:|:---:|
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) | Raw text | ✅ | - | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb)
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | - | -
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | | -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | ✅ | ✅ | - | -
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | n/a | n/a | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
Expand Down
39 changes: 38 additions & 1 deletion examples/test_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,13 @@

SRC_DIRS = [
os.path.join(os.path.dirname(__file__), dirname)
for dirname in ["text-generation", "text-classification", "language-modeling", "question-answering"]
for dirname in [
"text-generation",
"text-classification",
"token-classification",
"language-modeling",
"question-answering",
]
]
sys.path.extend(SRC_DIRS)

Expand All @@ -38,6 +44,7 @@
import run_generation
import run_glue
import run_mlm
import run_ner
import run_pl_glue
import run_squad

Expand Down Expand Up @@ -185,6 +192,36 @@ def test_run_mlm(self):
result = run_mlm.main()
self.assertLess(result["perplexity"], 42)

def test_run_ner(self):
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)

tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_ner.py
--model_name_or_path bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
--overwrite_output_dir
--do_train
--do_eval
--warmup_steps=2
--learning_rate=2e-4
--per_gpu_train_batch_size=2
--per_gpu_eval_batch_size=2
--num_train_epochs=2
""".split()

if torch_device != "cuda":
testargs.append("--no_cuda")

with patch.object(sys, "argv", testargs):
result = run_ner.main()
self.assertGreaterEqual(result["eval_accuracy_score"], 0.75)
self.assertGreaterEqual(result["eval_precision"], 0.75)
self.assertLess(result["eval_loss"], 0.5)

def test_run_squad(self):
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
Expand Down
38 changes: 35 additions & 3 deletions examples/token-classification/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,38 @@
## Named Entity Recognition
## Token classification

Based on the scripts [`run_ner.py`](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner.py) for Pytorch and
Fine-tuning the library models for token classification task such as Named Entity Recognition (NER) or Parts-of-speech
tagging (POS). The main scrip `run_ner.py` leverages the 🤗 Datasets library and the Trainer API. You can easily
customize it to your needs if you need extra processing on your datasets.

It will either run on a datasets hosted on our [hub](https://huggingface.co/datasets) or with your own text files for
training and validation.

The following example fine-tunes BERT on CoNLL-2003:

```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--dataset_name conll2003 \
--output_dir /tmp/test-ner \
--do_train \
--do_eval
```

To run on your own training and validation files, use the following command:

```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--output_dir /tmp/test-ner \
--do_train \
--do_eval
```

## Old version of the script

Based on the scripts [`run_ner_old.py`](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py) for Pytorch and
[`run_tf_ner.py`](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_tf_ner.py) for Tensorflow 2.

The following examples are covered in this section:
Expand Down Expand Up @@ -69,7 +101,7 @@ export SEED=1
To start training, just run:

```bash
python3 run_ner.py --data_dir ./ \
python3 run_ner_old.py --data_dir ./ \
--labels ./labels.txt \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR \
Expand Down
Loading