Transformer models do not use their corresponding pretrained poolers #1062

jeswan · 2020-09-17T18:34:50Z

Issue by zphang
Saturday Apr 11, 2020 at 18:39 GMT
Originally opened as nyu-mll/jiant#1062

The way that we are currently using Transformers models involves taking the based encoder, and extracting the full set of hidden activations (across all layers). See link. We later separately pull out only the top layer, and extract the first token representation if we're doing a single-vector task such as classification.

Because of this workflow, we do not end up using the pretrained pooler layers in the respective models, e.g. BERT, ALBERT. RoBERTa also inherits from BERT.

On the other hand, we do not expect this to be a major issue, as we have seen good results from tuning with this format across several works, e.g. https://arxiv.org/abs/1812.10860, https://arxiv.org/abs/1905.00537

jeswan · 2020-09-17T18:34:51Z

Comment by HaokunLiu
Saturday Apr 11, 2020 at 22:36 GMT

Adding this as additional reference.

https://arxiv.org/pdf/1903.05987.pdf found diagnostic classifier on finetuned BERT layers achieves similar performance in layer 9-12 (MRPC), and layer 5-12 (STS-B). See figure 1 in the linked pdf.
This suggests pertrained layers on the top may not be that helpful to downstream pair sentence classification tasks.

jeswan added the wontfix This will not be worked on label Sep 17, 2020

zphang mentioned this issue Oct 16, 2020

Transformer models do not use their corresponding pretrained poolers nyu-mll/jiant#1062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer models do not use their corresponding pretrained poolers #1062

Transformer models do not use their corresponding pretrained poolers #1062

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

Transformer models do not use their corresponding pretrained poolers #1062

Transformer models do not use their corresponding pretrained poolers #1062

Comments

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020