Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

存在bug: 当有连续且重复的字作为敏感词时,检验不出来 #3

Open
chongpengrao opened this issue Dec 13, 2019 · 2 comments

Comments

@chongpengrao
Copy link

存在bug: 当有连续且重复的字作为敏感词时,检验不出来

@toolgood
Copy link

可以尝试 我的项目 https://github.com/toolgood/ToolGood.Words
敏感词过滤支持C#,JAVA,GO,JS, Python3

@mkyuangithub
Copy link

存在bug: 当有连续且重复的字作为敏感词时,检验不出来

问题是作者在循环时变量污染导致,需要修改doFilter方法和isContains方法:

  1. 在 public static final String doFilter(final String src)方法中,我把错误的三行都注释掉了,见"变量污染问题注释掉"这样的字眼
    k = i;
    //cpcurrc = currc; // 当前字符的拷贝-变量污染问题注释掉
    for (; ++k < length;) {
    int temp = charConvert(chs[k]);
    //if (temp == cpcurrc)-变量污染问题注释掉
    //continue;-变量污染问题注释掉
    if (stopwdSet != null && stopwdSet.contains(temp))
  2. 在public static final boolean isContains方法中,我把错误的三行都注释掉了,见"变量污染问题注释掉"这样的字眼
    // cpcurrc = currc;-变量污染问题注释掉
    for (; ++k < length;) {
    int temp = charConvert(chs[k]);
    // if (temp == cpcurrc)-变量污染问题注释掉
    // continue;-变量污染问题注释掉
    if (stopwdSet != null && stopwdSet.contains(temp))
    运行结果如下,这样就解决了“连续重复词”不能过滤的bug了。

解析问题: 习大&_大 flg
解析字数 : 9
解析时间 : 29092636ns
解析时间 : 29ms


是否包含敏感词: true
解析时间 : 28377ns
解析时间 : 0ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants