SEAT score of bert-large-cased seems wrong #1

inmoonlight · 2021-05-01T10:12:10Z

Dear authors,
It seems the reported SEAT score of bert-large-cased is wrong.

I was able to reproduce the results based on the current code base, however, I found two errors in the code.

1. Even though I called bert-large-cased, tokenized tokens are all lowercased.

import pytorch_pretrained_bert as bert

version = 'bert-large-cased'

tokenizer = bert.BertTokenizer.from_pretrained(version)
text = 'SEAT score of bert-large-CASED'
tokenized = tokenizer.tokenize(text)  # ['seat', 'score', 'of', 'be', '##rt', '-', 'large', '-', 'case', '##d']

2. The score is calculated from the first token embedding vector, not a [CLS] token.

Below is the sentbias/encoders/bert.py, and you can find text is not prepended with a [CLS] token.

''' Convenience functions for handling BERT '''
import torch
import pytorch_pretrained_bert as bert


def load_model(version='bert-large-uncased'):
    ''' Load BERT model and corresponding tokenizer '''
    tokenizer = bert.BertTokenizer.from_pretrained(version)
    model = bert.BertModel.from_pretrained(version)
    model.eval()

    return model, tokenizer


def encode(model, tokenizer, texts):
    ''' Use tokenizer and model to encode texts '''
    encs = {}
    for text in texts:
        tokenized = tokenizer.tokenize(text)  # <<< BUG: a [CLS] token should be prepended
        indexed = tokenizer.convert_tokens_to_ids(tokenized)
        segment_idxs = [0] * len(tokenized)
        tokens_tensor = torch.tensor([indexed])
        segments_tensor = torch.tensor([segment_idxs])
        enc, _ = model(tokens_tensor, segments_tensor, output_all_encoded_layers=False)

        enc = enc[:, 0, :]  # extract the last rep of the first input
        encs[text] = enc.detach().view(-1).numpy()
    return encs

The text was updated successfully, but these errors were encountered:

W4ngatang · 2021-05-10T17:01:11Z

Hi Jihyung, Thanks for pointing these out to us. The first error is odd, and from what I understand, a bug of the pytorch-pretrained-bert library. I think you can either patch the library or update to a more recent version to resolve it (though that might break other aspects, I'm not sure) . The second error is our bad. We should definitely be appending the [CLS] token. I believe we tried multiple ways to aggregate the BERT sentence vectors and found that they performed similar, so I don't think this will have a major impact on results, but we can try running it with the corrected code (and if you do as well, please let us know!). Best, Alex

…

On Sat, May 1, 2021 at 6:12 AM Jihyung Moon ***@***.***> wrote: Dear authors, It seems the reported SEAT score of bert-large-cased is wrong. <https://user-images.githubusercontent.com/24843996/116778744-c7ea8900-aaae-11eb-9c10-00236335e747.png> I was able to reproduce the results based on the current code base, however, I found two errors in the code. 1. Even though I called bert-large-*cased*, tokenized tokens are all *lowercased.* import pytorch_pretrained_bert as bert version = 'bert-large-cased' tokenizer = bert.BertTokenizer.from_pretrained(version)text = 'SEAT score of bert-large-CASED'tokenized = tokenizer.tokenize(text) # ['seat', 'score', 'of', 'be', '##rt', '-', 'large', '-', 'case', '##d'] 2. The score is calculated from the *first token* embedding vector, *not a [CLS] token*. Below is the sentbias/encoders/bert.py, and you can find text is not prepended with a [CLS] token. ''' Convenience functions for handling BERT '''import torchimport pytorch_pretrained_bert as bert def load_model(version='bert-large-uncased'): ''' Load BERT model and corresponding tokenizer ''' tokenizer = bert.BertTokenizer.from_pretrained(version) model = bert.BertModel.from_pretrained(version) model.eval() return model, tokenizer def encode(model, tokenizer, texts): ''' Use tokenizer and model to encode texts ''' encs = {} for text in texts: tokenized = tokenizer.tokenize(text) # <<< BUG: a [CLS] token should be prepended indexed = tokenizer.convert_tokens_to_ids(tokenized) segment_idxs = [0] * len(tokenized) tokens_tensor = torch.tensor([indexed]) segments_tensor = torch.tensor([segment_idxs]) enc, _ = model(tokens_tensor, segments_tensor, output_all_encoded_layers=False) enc = enc[:, 0, :] # extract the last rep of the first input encs[text] = enc.detach().view(-1).numpy() return encs — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKDWG6UFBBWBJAYALN4NULTLPHYPANCNFSM436FSYTQ> .

inmoonlight · 2021-05-23T04:43:27Z

@W4ngatang

Thanks for the reply!

To mitigate error propagation, I fixed the codes to resolve the two abovementioned issues.

realliyifei · 2022-02-22T11:50:34Z

Thank you for the informative conversation @inmoonlight @W4ngatang

I have some quick questions:

So the purpose of enc[:, 0, :] here is to extract the last representation of token [CLS]? (though as mentioned above, it won’t be much different even instead pick the representation of the first subword’s token).
Is there any reason not to average the tokens along 0-axis (i.e. the average of each token’s last representation), which I thought might make more sense in evaluation given it catches each token’s information?
Is there any reason the code doesn’t put the model, token etc. on GPU here?

W4ngatang · 2022-05-11T01:30:47Z

Hi Yifei, Sorry for the late reply. 1. I believe it's extracting the top-level representation of the first token (which would have been the [CLS] token). 2. That's something you could do, but we followed the original BERT paper and (tried to) use the [CLS] token representation as the overall representation. You can easily try both! 3. I'm not sure, but the data is quite small, so it shouldn't take too long to run, even on CPU. Best, Alex

…

On Tue, Feb 22, 2022 at 6:50 AM Li, Yifei 李逸飞 ***@***.***> wrote: Thank you for the informative conversation @inmoonlight <https://github.com/inmoonlight> @W4ngatang <https://github.com/W4ngatang> I have some quick questions: 1. So the purpose of enc[:, 0, :] here is to extract the last representation of token [CLS]? (though as mentioned above, it won’t be much different even instead pick the representation of the first subword’s token). 2. Is there any reason not to average the tokens along 0-axis (i.e. the average of each token’s last representation), which I thought might make more sense in evaluation given it catches each token’s information? 3. Is there any reason the code doesn’t put the model, token etc. on GPU here? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKDWG5ZKO4DULHON3JS7XDU4N2BNANCNFSM436FSYTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

realliyifei · 2022-05-11T02:25:03Z

Thank you for your reply. That’s very helpful, Alex!

inmoonlight linked a pull request May 23, 2021 that will close this issue

Fix BERT-based seat scoring function #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEAT score of bert-large-cased seems wrong #1

SEAT score of bert-large-cased seems wrong #1

inmoonlight commented May 1, 2021

W4ngatang commented May 10, 2021 via email

inmoonlight commented May 23, 2021

realliyifei commented Feb 22, 2022

W4ngatang commented May 11, 2022 via email

realliyifei commented May 11, 2022

SEAT score of bert-large-cased seems wrong #1

SEAT score of bert-large-cased seems wrong #1

Comments

inmoonlight commented May 1, 2021

1. Even though I called bert-large-cased, tokenized tokens are all lowercased.

2. The score is calculated from the first token embedding vector, not a [CLS] token.

W4ngatang commented May 10, 2021 via email

inmoonlight commented May 23, 2021

realliyifei commented Feb 22, 2022

W4ngatang commented May 11, 2022 via email

realliyifei commented May 11, 2022