You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Adding to my previous posts in issues 19, I am trying to use google binary
(from google books) and get log probabilities of trigrams from some text. I am
getting NAN from the last trigrams. Attached is the code of what I am trying to
do. I am slightly modified these files and added some System.out.printlns to
see the outputs.
I text I am testing with is "Hello how are you". So essentially it is giving me
a sent [7380255 15474 152 26 45 7380256]. 7380255 is the start symbol and
7380256 is the stop symbol.
I am first getting the log probability of the bigram 7380255 15474, by passing
startpos as 0 and endpos as 2. Thereafter I am getting the log probabilities of
trigrams starting with startpos 0, like the code below
for (int i = 0; i <= sent.length - 3; i++) {
System.out.println("Getting score from " + sent[i] + " to " + sent[i+2]);
score = lm_.getLogProb(sent, i, i+3);
System.out.println("score " + score);
if(Float.isNaN(score))
System.out.println("Returned NaN");
else
sentScore += score;
}
The problem is happening with within StupidBackoffLm in the following line
probContext = localMap.getValueAndOffset(probContext, probContextOrder,
ngram[i], scratch);
only with the last trigram when startpost is 3 and end pos is 6.
scratch.value is returning -1 with ngram[i] being the end symbol or 7380256.
This is resulting in a NAN logprob.
I tried the same with scoreSentence, it gives the same problem.
Can you please help me in understanding what mistake I am doing ?
Thanks
Regards
Debanjan
Original issue reported on code.google.com by b.deban...@gmail.com on 24 Mar 2014 at 11:36
Original issue reported on code.google.com by
b.deban...@gmail.com
on 24 Mar 2014 at 11:36Attachments:
The text was updated successfully, but these errors were encountered: