-
Notifications
You must be signed in to change notification settings - Fork 27.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
267 changed files
with
9,328 additions
and
9,330 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,26 @@ | ||
BERTology | ||
----------------------------------------------------------------------------------------------------------------------- | ||
|
||
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are: | ||
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT | ||
(that some call "BERTology"). Some good examples of this field are: | ||
|
||
|
||
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: https://arxiv.org/abs/1905.05950 | ||
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: | ||
https://arxiv.org/abs/1905.05950 | ||
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650 | ||
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning: https://arxiv.org/abs/1906.04341 | ||
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. | ||
Manning: https://arxiv.org/abs/1906.04341 | ||
|
||
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to help people access the inner representations, mainly adapted from the great work of Paul Michel (https://arxiv.org/abs/1905.10650): | ||
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to | ||
help people access the inner representations, mainly adapted from the great work of Paul Michel | ||
(https://arxiv.org/abs/1905.10650): | ||
|
||
|
||
* accessing all the hidden-states of BERT/GPT/GPT-2, | ||
* accessing all the attention weights for each head of BERT/GPT/GPT-2, | ||
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650. | ||
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained | ||
in https://arxiv.org/abs/1905.10650. | ||
|
||
To help you understand and use these features, we have added a specific example script: `bertology.py <https://github.com/huggingface/transformers/blob/master/examples/bertology/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE. | ||
To help you understand and use these features, we have added a specific example script: `bertology.py | ||
<https://github.com/huggingface/transformers/blob/master/examples/bertology/run_bertology.py>`_ while extract | ||
information and prune a model pre-trained on GLUE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.