Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ner label re alignment #10568

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
009409c
add subword_label_re_alignment strategies
elk-cloner Mar 6, 2021
ee7aeb0
remove redundent import
elk-cloner Mar 6, 2021
96f68e4
pass local test
elk-cloner Mar 13, 2021
1017651
Restore compatibility with existing NER pipeline tests
francescorubbo Mar 20, 2021
62a526a
Merge pull request #1 from francescorubbo/ner_label_re_alignment
elk-cloner Mar 21, 2021
9052a33
Refactor label re-alignment in NER pipeline and add tests
francescorubbo Mar 22, 2021
096c433
Merge pull request #2 from francescorubbo/ner_label_re_alignment
elk-cloner Mar 24, 2021
df641ab
Import numpy, used for arrays in test input
francescorubbo Mar 26, 2021
482f325
Bugfix: ensure entities are updated with aligned labels.
francescorubbo Apr 4, 2021
e728ce1
Update tests to probe more scenarios and with bugfix.
francescorubbo Apr 4, 2021
04925f2
Define and use AggregationStrategy enum as argument
francescorubbo Apr 25, 2021
399d713
Use AggregationStrategy.FIRST as default
francescorubbo Apr 25, 2021
4338be8
Updated expected test results and move to fixtures
francescorubbo Apr 25, 2021
cdfe3ac
Use score corresponding to chosen label.
francescorubbo Apr 25, 2021
324f641
Fill entity attributes only if they exist.
francescorubbo Apr 25, 2021
7fcfc4e
Style fixes
francescorubbo Apr 25, 2021
031f3ef
Merge branch 'master' of https://github.com/huggingface/transformers …
francescorubbo Apr 25, 2021
909e5c8
Cleanup leftover from solving conflicts
francescorubbo Apr 25, 2021
30f0658
updating the checkpoint for GPT2ForSequence Classification to one wit…
abiolaTresor Apr 26, 2021
04ab2ca
add pooling layer support (#11439)
thevasudevgupta Apr 26, 2021
32dbb2d
make style (#11442)
patrickvonplaten Apr 26, 2021
4b72cfd
Pin black to 20.8.b1
sgugger Apr 26, 2021
c1625b3
With style
sgugger Apr 26, 2021
4bd6b54
Pin black to 21.4b0
sgugger Apr 26, 2021
38a716c
TF BART models - Add `cross_attentions` to model output and fix cross…
stancld Apr 26, 2021
d7633a4
Add basic support for FP16 in SageMaker model parallelism (#11407)
sgugger Apr 26, 2021
e3e70f9
docs(examples): fix link to TPU launcher script (#11427)
Apr 26, 2021
b24ead8
fix some typos in docs, comments, logging/errors (#11432)
LSinev Apr 26, 2021
ab2cabb
Pass along seed to DistributedSampler (#11406)
sgugger Apr 26, 2021
6715e3b
Clarify description of the is_split_into_words argument (#11449)
kstathou Apr 26, 2021
a753caf
[docs] fix invalid class name (#11438)
stas00 Apr 26, 2021
ce11318
make sure to test against the local checkout (#11437)
stas00 Apr 26, 2021
b03b2a6
Style
sgugger Apr 26, 2021
7959d83
Give each test a different repo name (#11453)
sgugger Apr 26, 2021
1d30ec9
[Examples] Fixes inconsistency around eval vs val and predict vs test…
bhadreshpsavani Apr 26, 2021
0661abc
Variable Correction for Consistency in Distillation Example (#11444)
jaimeenahn Apr 26, 2021
bc2571e
[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418)
stas00 Apr 26, 2021
741d48f
Remove max length beam scorer (#11378)
GeetDsa Apr 26, 2021
88ac60f
update QuickTour docs to reflect model output object (#11462)
Apr 27, 2021
7ceff67
Finish Making Quick Tour respect the model object (#11467)
Apr 27, 2021
8d43c71
fix docs for decoder_input_ids (#11466)
patil-suraj Apr 27, 2021
2d27900
Update min versions in README and add Flax (#11472)
sgugger Apr 28, 2021
c0eb218
Update `PreTrainedTokenizerBase` to check/handle batch length for `te…
hamelsmu Apr 28, 2021
3f6add8
fix #1149 (#11493)
hamelsmu Apr 28, 2021
9114e51
add subword_label_re_alignment strategies
elk-cloner Mar 6, 2021
7d518b8
remove redundent import
elk-cloner Mar 6, 2021
e108da1
pass local test
elk-cloner Mar 13, 2021
8879e12
Restore compatibility with existing NER pipeline tests
francescorubbo Mar 20, 2021
5662477
Refactor label re-alignment in NER pipeline and add tests
francescorubbo Mar 22, 2021
92e6cee
Import numpy, used for arrays in test input
francescorubbo Mar 26, 2021
b0074c7
Bugfix: ensure entities are updated with aligned labels.
francescorubbo Apr 4, 2021
af865e3
Update tests to probe more scenarios and with bugfix.
francescorubbo Apr 4, 2021
cdd1db2
Define and use AggregationStrategy enum as argument
francescorubbo Apr 25, 2021
69da7cc
Use AggregationStrategy.FIRST as default
francescorubbo Apr 25, 2021
8b64c28
Updated expected test results and move to fixtures
francescorubbo Apr 25, 2021
2f3b8b0
Use score corresponding to chosen label.
francescorubbo Apr 25, 2021
1e48070
Fill entity attributes only if they exist.
francescorubbo Apr 25, 2021
c27f9eb
Style fixes
francescorubbo Apr 25, 2021
a251184
Merge branch 'ner_label_re_alignment' of https://github.com/elk-clone…
francescorubbo Apr 29, 2021
45e1919
Remove duplicated definition caused by rebasing after merging
francescorubbo Apr 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.PHONY: deps_table_update modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs

# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src

check_dirs := examples tests src utils

Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ limitations under the License.
</p>

<h3 align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
<p>State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow
</h3>

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.

🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
🤗 Transformers is backed by the three most popular deep learning libraries, [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.

## Online demos

Expand Down Expand Up @@ -152,16 +152,16 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta

### With pip

This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.

You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).
Then, you will need to install at least one of Flax, PyTorch or TensorFlow.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).

When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:

```bash
pip install transformers
Expand All @@ -179,7 +179,7 @@ Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
conda install -c huggingface transformers
```

Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.

## Models architectures

Expand Down Expand Up @@ -247,7 +247,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[XLSR-Wav2Vec2](https://huggingface.co/transformers/model_doc/xlsr_wav2vec2.html)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.

To check if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)

These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).

Expand Down
20 changes: 10 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Transformers
=======================================================================================================================

State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow

🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
PyTorch and TensorFlow.

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.

Expand Down Expand Up @@ -43,11 +43,11 @@ Lower compute costs, smaller carbon footprint:
Choose the right framework for every part of a model's lifetime:

- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Deep interoperability between Jax, Pytorch and TensorFlow models
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
- Seamlessly pick the right framework for training, evaluation, production

Experimental support for Flax with a few models right now, expected to grow in the coming months.
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!

`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
Expand All @@ -74,8 +74,8 @@ The documentation is organized in five parts:
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.

The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts
and conversion utilities for the following models:
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
conversion utilities for the following models:

..
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
Expand Down Expand Up @@ -251,8 +251,8 @@ and conversion utilities for the following models:
.. _bigtable:

The table below represents the current support in the library for each of those models, whether they have a Python
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch,
TensorFlow and/or Flax.
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.

..
This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!
Expand Down
6 changes: 0 additions & 6 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,6 @@ So if you don't have any specific environment variable set, the cache directory
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
environment variable for ``TRANSFORMERS_CACHE``.

### Note on model downloads (Continuous Integration or large-scale deployments)

If you expect to be downloading large volumes of models (more than 10,000) from huggingface.co (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately, we'd love to help with this.

### Offline mode

It's possible to run 🤗 Transformers in a firewalled or a no-network environment.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/main_classes/output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
Model outputs
-----------------------------------------------------------------------------------------------------------------------

PyTorch models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those
are data structures containing all the information returned by the model, but that can also be used as tuples or
All models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those are
data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.

Let's see of this looks on an example:
Expand Down
Loading