Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example for is next sentence #48

Closed
charlesmartin14 opened this issue Nov 21, 2018 · 19 comments
Closed

example for is next sentence #48

charlesmartin14 opened this issue Nov 21, 2018 · 19 comments

Comments

@charlesmartin14
Copy link

charlesmartin14 commented Nov 21, 2018

Can you make up a working example for 'is next sentence'

Is this expected to work properly ?

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenized input
text = "Who was Jim Morrison ? Jim Morrison was a puppeteer"
tokenized_text = tokenizer.tokenize(text)

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])

# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()

# Predict is Next Sentence ?
predictions = model(tokens_tensor, segments_tensors)
@thomwolf
Copy link
Member

I think it should work. You should get a [1, 2] tensor of logits where predictions[0, 0] is the score of Next sentence being True and predictions[0, 1] is the score of Next sentence being False. So just take the max of the two (or use a SoftMax to get probabilities).
Did you try it?
The model behaves better on longer sentences of course (it's mainly trained on 512 tokens inputs).

@thomwolf
Copy link
Member

Closing that for now, feel free to reopen if there is another issue.

@Alexadar
Copy link

Guys, are [CLS] and [SEP] tokens mandatory for this example?

@dirkgr
Copy link
Contributor

dirkgr commented Jan 31, 2019

This is not super clear, even wrong in the examples, but there is this note in the docstring for BertModel:

`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a
    classifier pretrained on top of the hidden state associated to the first character of the
    input (`CLF`) to train on the Next-Sentence task (see BERT's paper).

That seems to suggest pretty strongly that you have to put in the CLF token.

@hackable
Copy link

hackable commented Feb 13, 2019

from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])

# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()

# Predict is Next Sentence ?
predictions = model(tokens_tensor, segments_tensors )




print(predictions)
tensor([[ 6.3714, -6.3910]], grad_fn=<AddmmBackward>)

How do i infer this as true or false

@dariocazzani
Copy link

Those are the logits, because you did not pass the next_sentence_label.

My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.

Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)

For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)

@pengjiao123
Copy link

predictions = model(tokens_tensor, segments_tensors )
I try the code more than once,why I have the different result?
sometime predictions[0, 0] is higher ,however, the same sentence pair,predictions[0, 0] is lower.

@thomwolf
Copy link
Member

Maybe your model is not in evaluation mode (model.eval())?
You need to do this to desactivate the dropout modules.

@pengjiao123
Copy link

It is OK.THANKS A LOT.

@AIGyan
Copy link

AIGyan commented Jun 4, 2019

error: --> 197 embeddings = words_embeddings + position_embeddings + token_type_embeddings 198 embeddings = self.LayerNorm(embeddings) 199 embeddings = self.dropout(embeddings) The size of tensor a (21) must match the size of tensor b (14) at non-singleton dimension 1

The above issues get resolved, when I added few extra 1's and 0's to make the shape similar tokens_tensor and segments_tensors. Just wondering am I using in a right way.

My predictions output is a tensor array of size 21 X 30522 .
And what I believe the example is to predict the word which is [MASK] . Can you also please guide how to predict the next sentence?

@yuchuang1979
Copy link

Maybe your model is not in evaluation mode (model.eval())?
You need to do this to desactivate the dropout modules.

@thomwolf Actually even when I used model.eval() I still got different results. I observed this when I use every model of the package (BertModel, BertForNextSentencePrediction etc). Only when I fixed the length of the input (e.g. to 128), I can get the same results. In this way I have to pad 0 to indexed_tokens so it has a fixed length.

Could you explain why is like this, or did I make any mistake?

Thank you so much!

@Alexadar
Copy link

Alexadar commented Aug 2, 2019

Maybe your model is not in evaluation mode (model.eval())?
You need to do this to desactivate the dropout modules.

@thomwolf Actually even when I used model.eval() I still got different results. I observed this when I use every model of the package (BertModel, BertForNextSentencePrediction etc). Only when I fixed the length of the input (e.g. to 128), I can get the same results. In this way I have to pad 0 to indexed_tokens so it has a fixed length.

Could you explain why is like this, or did I make any mistake?

Thank you so much!

Make sure

  1. input_ids, input_mask, segment_ids have same length
  2. vocabulary file for tokenizer is from the same config dir as your bert_config.json

I had symilar symptoms when vocab and config was from diferent berts

@pbabvey
Copy link

pbabvey commented Oct 27, 2019

I noticed that the probability for longer sentences, regardless of how much they are related to the same subject, is higher than the shorter ones. For example, I added some random sentences to the end of the first or second part and observed significant increase in the first logit value. Is it a way to regularize the model for the next sentence prediction?

@ehsan-soe
Copy link

ehsan-soe commented Oct 29, 2019

@pbabvey I am observing the same thing.
are the probabilities length normalized?

@AjitAntony
Copy link

Those are the logits, because you did not pass the next_sentence_label.

My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.

Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)

For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)

im getting different scores for the sentences that you have tried . please advise why i'm getting it below is my code .

import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"

text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]*len(text1_toks) + [1]*len(text2_toks)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)

o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)

o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)

why is the score still high 0.9923 even after apply softmax ?

@parth126
Copy link

Those are the logits, because you did not pass the next_sentence_label.
My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.
Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)
For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)

im getting different scores for the sentences that you have tried . please advise why i'm getting it below is my code .

import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"

text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]*len(text1_toks) + [1]*len(text2_toks)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)

o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)

o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)

why is the score still high 0.9923 even after apply softmax ?

I am facing the same issue. No matter what sentences I use, I always get very high probability of the second sentence being related to the first.

@LysandreJik
Copy link
Member

@parth126 have you seen #1788 and is it related to your issue?

@parth126
Copy link

@parth126 have you seen #1788 and is it related to your issue?

Yes it was the same issue. And the solution worked like a charm.
Many thanks @LysandreJik

@AjitAntony
Copy link

@LysandreJik thanks for the information

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
update rationale label generation for mt/mat coqa runner
jonb377 added a commit to jonb377/hf-transformers that referenced this issue Apr 5, 2024
Use CheckpointManager for checkpointing
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests