n-gram中句子加首尾标志符 #25

logan0czy · 2020-03-20T07:06:22Z

n-gram模型的讲义中提到了在处理每一个句子的时候都需要加一个首尾标志（<start>,<end>），比如如下的两个句子，bigram model为例：
(1). <start> I am Sam <end>
(2). <start> Sam I am <end>
具体我有三个疑惑：
(1). 对于结尾符<end>，文中的解释为"To make the bigram grammar a true probability distribution. Without an end-symbol, the sentence probabilities for all sentences of a given length would sum to one. This model would define an infinite set of probability distribution, with one distribution per sentence length."我不是很明白，请问有没有更直观的解释或者参考的资料呢？
(2).对于起始符<start>，文中解释是为了"to give us the bigram context of the first word."起始符没有像结尾符一样在概率分布方面的作用吗？
(3). 对于n-gram,是否需要在首尾加上n-1个起始和结尾符，还是仅仅只需要添加一个就行了呢？
跪求解惑。。。

logan0czy · 2020-03-20T09:09:22Z

问题(3)已经找到答案了，n-gram LM是需要在句首和句尾加上对应的n-1个起止标志来进行概率计算～

1024er · 2020-03-27T02:32:32Z

看的不是很明白，能不能提供一下原讲义的章节？

logan0czy · 2020-03-27T02:57:29Z

看的不是很明白，能不能提供一下原讲义的章节？

在Thu Jan 23那节课中的N-gram language model讲义，链接：https://web.stanford.edu/~jurafsky/slp3/3.pdf

对应的内容在两个部分：讲义第4页最下方；讲义第6页的"some practical issues"部分也有一点涉及

logan0czy closed this as completed Mar 27, 2020

logan0czy reopened this Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n-gram中句子加首尾标志符 #25

n-gram中句子加首尾标志符 #25

logan0czy commented Mar 20, 2020 •

edited

Loading

logan0czy commented Mar 20, 2020

1024er commented Mar 27, 2020

logan0czy commented Mar 27, 2020

n-gram中句子加首尾标志符 #25

n-gram中句子加首尾标志符 #25

Comments

logan0czy commented Mar 20, 2020 • edited Loading

logan0czy commented Mar 20, 2020

1024er commented Mar 27, 2020

logan0czy commented Mar 27, 2020

logan0czy commented Mar 20, 2020 •

edited

Loading