diff --git a/examples/legacy/token-classification/README.md b/examples/legacy/token-classification/README.md index 3411a61548629f..e484f332f32662 100644 --- a/examples/legacy/token-classification/README.md +++ b/examples/legacy/token-classification/README.md @@ -129,6 +129,71 @@ On the test dataset the following results could be achieved: 10/04/2019 00:42:42 - INFO - __main__ - recall = 0.8624150210424085 ``` +#### Run the Tensorflow 2 version + +To start training, just run: + +```bash +python3 run_tf_ner.py --data_dir ./ \ +--labels ./labels.txt \ +--model_name_or_path $BERT_MODEL \ +--output_dir $OUTPUT_DIR \ +--max_seq_length $MAX_LENGTH \ +--num_train_epochs $NUM_EPOCHS \ +--per_device_train_batch_size $BATCH_SIZE \ +--save_steps $SAVE_STEPS \ +--seed $SEED \ +--do_train \ +--do_eval \ +--do_predict +``` + +Such as the Pytorch version, if your GPU supports half-precision training, just add the `--fp16` flag. After training, the model will be both evaluated on development and test datasets. + +#### Evaluation + +Evaluation on development dataset outputs the following for our example: +```bash + precision recall f1-score support + + LOCderiv 0.7619 0.6154 0.6809 52 + PERpart 0.8724 0.8997 0.8858 4057 + OTHpart 0.9360 0.9466 0.9413 711 + ORGpart 0.7015 0.6989 0.7002 269 + LOCpart 0.7668 0.8488 0.8057 496 + LOC 0.8745 0.9191 0.8963 235 + ORGderiv 0.7723 0.8571 0.8125 91 + OTHderiv 0.4800 0.6667 0.5581 18 + OTH 0.5789 0.6875 0.6286 16 + PERderiv 0.5385 0.3889 0.4516 18 + PER 0.5000 0.5000 0.5000 2 + ORG 0.0000 0.0000 0.0000 3 + +micro avg 0.8574 0.8862 0.8715 5968 +macro avg 0.8575 0.8862 0.8713 5968 +``` + +On the test dataset the following results could be achieved: +```bash + precision recall f1-score support + + PERpart 0.8847 0.8944 0.8896 9397 + OTHpart 0.9376 0.9353 0.9365 1639 + ORGpart 0.7307 0.7044 0.7173 697 + LOC 0.9133 0.9394 0.9262 561 + LOCpart 0.8058 0.8157 0.8107 1150 + ORG 0.0000 0.0000 0.0000 8 + OTHderiv 0.5882 0.4762 0.5263 42 + PERderiv 0.6571 0.5227 0.5823 44 + OTH 0.4906 0.6667 0.5652 39 + ORGderiv 0.7016 0.7791 0.7383 172 + LOCderiv 0.8256 0.6514 0.7282 109 + PER 0.0000 0.0000 0.0000 11 + +micro avg 0.8722 0.8774 0.8748 13869 +macro avg 0.8712 0.8774 0.8740 13869 +``` + ### Emerging and Rare Entities task: WNUT’17 (English NER) dataset Description of the WNUT’17 task from the [shared task website](http://noisy-text.github.io/2017/index.html): diff --git a/examples/token-classification/run_tf_ner.py b/examples/legacy/token-classification/run_tf_ner.py old mode 100755 new mode 100644 similarity index 100% rename from examples/token-classification/run_tf_ner.py rename to examples/legacy/token-classification/run_tf_ner.py diff --git a/examples/token-classification/README.md b/examples/token-classification/README.md index bf1632a74636d4..e2d11e39c46cbf 100644 --- a/examples/token-classification/README.md +++ b/examples/token-classification/README.md @@ -119,68 +119,3 @@ export NUM_EPOCHS=3 export SAVE_STEPS=750 export SEED=1 ``` - -#### Run the Tensorflow 2 version - -To start training, just run: - -```bash -python3 run_tf_ner.py --data_dir ./ \ ---labels ./labels.txt \ ---model_name_or_path $BERT_MODEL \ ---output_dir $OUTPUT_DIR \ ---max_seq_length $MAX_LENGTH \ ---num_train_epochs $NUM_EPOCHS \ ---per_device_train_batch_size $BATCH_SIZE \ ---save_steps $SAVE_STEPS \ ---seed $SEED \ ---do_train \ ---do_eval \ ---do_predict -``` - -Such as the Pytorch version, if your GPU supports half-precision training, just add the `--fp16` flag. After training, the model will be both evaluated on development and test datasets. - -#### Evaluation - -Evaluation on development dataset outputs the following for our example: -```bash - precision recall f1-score support - - LOCderiv 0.7619 0.6154 0.6809 52 - PERpart 0.8724 0.8997 0.8858 4057 - OTHpart 0.9360 0.9466 0.9413 711 - ORGpart 0.7015 0.6989 0.7002 269 - LOCpart 0.7668 0.8488 0.8057 496 - LOC 0.8745 0.9191 0.8963 235 - ORGderiv 0.7723 0.8571 0.8125 91 - OTHderiv 0.4800 0.6667 0.5581 18 - OTH 0.5789 0.6875 0.6286 16 - PERderiv 0.5385 0.3889 0.4516 18 - PER 0.5000 0.5000 0.5000 2 - ORG 0.0000 0.0000 0.0000 3 - -micro avg 0.8574 0.8862 0.8715 5968 -macro avg 0.8575 0.8862 0.8713 5968 -``` - -On the test dataset the following results could be achieved: -```bash - precision recall f1-score support - - PERpart 0.8847 0.8944 0.8896 9397 - OTHpart 0.9376 0.9353 0.9365 1639 - ORGpart 0.7307 0.7044 0.7173 697 - LOC 0.9133 0.9394 0.9262 561 - LOCpart 0.8058 0.8157 0.8107 1150 - ORG 0.0000 0.0000 0.0000 8 - OTHderiv 0.5882 0.4762 0.5263 42 - PERderiv 0.6571 0.5227 0.5823 44 - OTH 0.4906 0.6667 0.5652 39 - ORGderiv 0.7016 0.7791 0.7383 172 - LOCderiv 0.8256 0.6514 0.7282 109 - PER 0.0000 0.0000 0.0000 11 - -micro avg 0.8722 0.8774 0.8748 13869 -macro avg 0.8712 0.8774 0.8740 13869 -```