Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

标点预测 _clean_text() 函数第二个 sub 多余了 #2475

Closed
yt605155624 opened this issue Sep 28, 2022 · 4 comments
Closed

标点预测 _clean_text() 函数第二个 sub 多余了 #2475

yt605155624 opened this issue Sep 28, 2022 · 4 comments

Comments

@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 28, 2022

标点预测 _clean_text() 函数第二个 sub 多余了,因为第一个 sub 已经把所有标点过滤掉了,该函数完全不需要输入 punc_list

测试代码:

import re
def clean_text(text,punc_list):
    text = text.lower()
    print("text0:",text)
    text = re.sub('[^A-Za-z0-9\u4e00-\u9fa5]', '', text)
    print("text1:",text)
    text = re.sub(f'[{"".join([p for p in punc_list][1:])}]', '',
                    text)
    print("text2:",text)
    return text

text = "你好,我是飞桨?的程序员。你好吗!"
punc_list=[',','。','?']
print(clean_text(text,punc_list))

output:

text0: 你好,我是飞桨?的程序员。你好吗!
text1: 你好我是飞桨的程序员你好吗
text2: 你好我是飞桨的程序员你好吗
你好我是飞桨的程序员你好吗

用到的位置:

  1. def _clean_text(self, text):
  2. def _clean_text(text, punc_list):

鼓励开发者提交 pr 修改

@stale
Copy link

stale bot commented Nov 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale label Nov 12, 2022
@yt605155624 yt605155624 removed the Stale label Dec 16, 2022
@stale
Copy link

stale bot commented Feb 2, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale label Feb 2, 2023
@yt605155624 yt605155624 removed the Stale label Feb 8, 2023
@stale
Copy link

stale bot commented Mar 25, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale label Mar 25, 2023
@stale
Copy link

stale bot commented May 20, 2023

This issue is closed. Please re-open if needed.

@stale stale bot closed this as completed May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants