Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n-gram中句子加首尾标志符 #25

Open
logan0czy opened this issue Mar 20, 2020 · 3 comments
Open

n-gram中句子加首尾标志符 #25

logan0czy opened this issue Mar 20, 2020 · 3 comments

Comments

@logan0czy
Copy link
Contributor

logan0czy commented Mar 20, 2020

n-gram模型的讲义中提到了在处理每一个句子的时候都需要加一个首尾标志(<start>,<end>),比如如下的两个句子,bigram model为例:
(1). <start> I am Sam <end>
(2). <start> Sam I am <end>
具体我有三个疑惑:
(1). 对于结尾符<end>,文中的解释为"To make the bigram grammar a true probability distribution. Without an end-symbol, the sentence probabilities for all sentences of a given length would sum to one. This model would define an infinite set of probability distribution, with one distribution per sentence length."我不是很明白,请问有没有更直观的解释或者参考的资料呢?
(2).对于起始符<start>,文中解释是为了"to give us the bigram context of the first word."起始符没有像结尾符一样在概率分布方面的作用吗?
(3). 对于n-gram,是否需要在首尾加上n-1个起始和结尾符,还是仅仅只需要添加一个就行了呢?
跪求解惑。。。

@logan0czy
Copy link
Contributor Author

问题(3)已经找到答案了,n-gram LM是需要在句首和句尾加上对应的n-1个起止标志来进行概率计算~

@1024er
Copy link

1024er commented Mar 27, 2020

看的不是很明白,能不能提供一下原讲义的章节?

@logan0czy logan0czy reopened this Mar 27, 2020
@logan0czy
Copy link
Contributor Author

看的不是很明白,能不能提供一下原讲义的章节?

在Thu Jan 23那节课中的N-gram language model讲义,链接:https://web.stanford.edu/~jurafsky/slp3/3.pdf

对应的内容在两个部分:讲义第4页最下方;讲义第6页的"some practical issues"部分也有一点涉及

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants