Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

带特殊字符文本分词异常 #15

Open
chenying99 opened this issue Jul 25, 2017 · 0 comments
Open

带特殊字符文本分词异常 #15

chenying99 opened this issue Jul 25, 2017 · 0 comments

Comments

@chenying99
Copy link

chenying99 commented Jul 25, 2017

首先,需要了解一些基本事实: 􀂄 中国的小麦依靠自给。

据香港媒体报导,嫩模Jeana(何佩瑜)四处惹是非,结果被其他𡃁模群起围攻,指她整容。

at java.util.Vector.get(Unknown Source)
at org.thunlp.thulac.cb.CBTaggingDecoder.segment(CBTaggingDecoder.java:276)

貌似是POCGraph graph对象与句子长度不一致

句子中特殊字符占据两个长度,而POCGraph graph对象少了一个长度

补充:

后面的方法 this.nGramFeature.putValues(sequence, len); 里面用到sequence.charat(i)方法,都会出问题了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant