Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

how to output sentence's probability? #96

Open
OswaldoBornemann opened this issue Feb 15, 2019 · 12 comments
Open

how to output sentence's probability? #96

OswaldoBornemann opened this issue Feb 15, 2019 · 12 comments

Comments

@OswaldoBornemann
Copy link

May i ask how to use awd-lstm-lm to output sentence's probability ?

@lorelupo
Copy link

Same question here! @tsungruihon did you find a solution?

@OswaldoBornemann
Copy link
Author

@wolflo no i haven't, still working in progress.

@lorelupo
Copy link

lorelupo commented Mar 5, 2019

@tsungruihon I calculate the likelihood of an input sentence by summing the log-probabilities output by the model for each word of the input sentence. It looks like this:

def score(self, sentence):
    tokens = text_utils.getTokens(sentence)
    idxs = [self.dictionary.getIndex(x) for x in tokens]
    idxs = torch.LongTensor(idxs)
    # make it look as a batch of one element
    input = batch_utils.batchifyCorpusTensor(idxs, 1)
    # instantiate hidden states
    hidden = self.model.initHidden(batchSize=1)
    output, hidden = self.model(input, hidden)
    logits = self.model.decoder(output)
    logProba = F.log_softmax(logits, dim=1)
    return sum([logProba[i][idxs[i]] for i in range(len((idxs)))])

@OswaldoBornemann
Copy link
Author

@wolflo thanks my friend! Nice work!

@gailweiss
Copy link

Hi, thanks @wolflo ! One thing is confusing me - does this also take into account the probability of the first token in the sentence? (i.e., the probability the model assigns to the first token when in the state given by model.initHidden?)

@lorelupo
Copy link

lorelupo commented Jun 3, 2019

Hi @gailweiss , an approximation of the log probability of the first token in the sentence should be given by logProba[0][idxs[0]], right? However, I might have misunderstood your doubt.

@gailweiss
Copy link

gailweiss commented Jun 3, 2019

Hi @wolflo , thanks for the quick response!
I guess what I'm not clear on is:

isn't logProba[i] the (log) next-token distribution after step i? i.e. if the input is a a <eos>, isn't logProba[0] the (log) probabilities for each input token after seeing that initial a (and logProba[-1] the log probabilities after having seen all of "a a <eos>")?

more directly, isn't output[0] only the output of the model after processing the first input token?

@lorelupo
Copy link

lorelupo commented Jun 3, 2019

Oh, I see! That's an excellent remark. Then, I think you could rewrite the above scoring function as:

def score(self, sentence):
    tokens = text_utils.getTokens( "<eos> " + sentence)  # <eos> here serves as <sos>
    idxs = [self.dictionary.getIndex(x) for x in tokens]
    idxs = torch.LongTensor(idxs)
    # make it look as a batch of one element
    input = batch_utils.batchifyCorpusTensor(idxs, 1)
    # instantiate hidden states
    hidden = self.model.initHidden(batchSize=1)
    output, hidden = self.model(input, hidden)
    logits = self.model.decoder(output)
    logProba = F.log_softmax(logits, dim=1)
    return sum([logProba[i][idxs[i+1]] for i in range(len((idxs))-1)])

What do you think?

@gailweiss
Copy link

This seems to make sense :) thank you for taking the time to get into this!

I assume/hope the way the models here are trained, one sequence begins after the of the previous, i.e. I hope that the training in this repository also trains the distribution after . But at any rate this is a consistent solution and its just a question of whether the model optimises appropriately, which is something else.

Thank you!

@lorelupo
Copy link

lorelupo commented Jun 3, 2019

Indeed, training in this repo is performed over a long tensor representing the concatenation of all the sentences of the corpus, with the tag <eos> appended at the end of each sentence.

Thank you for pointing out this issue !

@ishpiki
Copy link

ishpiki commented Jul 8, 2019

Hi, @wolflo, thanks for the code, I have one issue related to the next word prediction, because by given word and previous hidden states we could try to predict the next most probable word according to softmax probability distribution. Did you try to do this with your function?
When I did it with the trained model (default settings, wiki-2 dataset), the result was not so good:

original: an English film , television and theatre actor . He had a guest @-@ starring role on the television series The Bill in 2000 . This was followed by a starring role in the play Herons written by Simon Stephens , which was performed in 2001 at the Royal Court Theatre . He had a guest role in the television
predicted: @-@ , and , . , also a appearance seller in the series , and the , was the by a in in the film , . by . who was released by the . the Academy of in was previously appearance in the film series

May be you faced with this issue before?

Thanks.

@lorelupo
Copy link

lorelupo commented Jul 8, 2019

I tried sentence generation some time ago with the awd-lstm model trained on wikitext-2. Results were pretty poor for me too. You might improve generation quality by adjusting the temperature, by using some tricks like beam search or by training the model on bigger datasets. Unfortunately, I do not have time to dig further into this right now. Should I work on this in the future, I will let you know !

Have a good day :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants