Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解决中文字符被拆分的问题 #715

Closed
wants to merge 1 commit into from

Conversation

faroasis
Copy link

@faroasis faroasis commented Dec 4, 2023

解决中文字符被拆分的问题。收集连续的未知token,并尝试转换为可识别的中文字符(如果希望成功率高一些的话,可以只取unk_tokens的后2-3位,根据编码决定)。

解决中文字符被拆分的问题。收集连续的未知token,并尝试转换为可识别的中文字符(如果希望成功率高一些的话,可以只取unk_tokens的后2-3位,根据编码决定)。
@XprobeBot XprobeBot added this to the v0.6.6 milestone Dec 4, 2023
@faroasis
Copy link
Author

faroasis commented Dec 4, 2023

211行 output_ids.append(token)
似乎也应该移动到223行

@@ -83,6 +83,7 @@ def prepare_logits_processor(
processor_list.append(TopKLogitsWarper(top_k))
return processor_list

CHAR_UNK = b"\xef\xbf\xbd".decode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在其他模型的都没有遇到过这个问题,这个 CHAR_UNK 定义是从哪里获取的?

@XprobeBot XprobeBot modified the milestones: v0.6.6, v0.7.0 Dec 8, 2023
@aresnow1
Copy link
Contributor

It should be solved in #747, you can test it when we release new version.

@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2, v0.7.3 Dec 12, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4, Temp, v0.8.0 Dec 22, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.5, v0.8.0, v0.8.1 Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.8.1, v0.8.2 Jan 19, 2024
@XprobeBot XprobeBot modified the milestones: v0.8.2, v0.8.4, v0.8.5 Feb 2, 2024
@XprobeBot XprobeBot added this to the v0.9.0 milestone Feb 6, 2024
@XprobeBot XprobeBot modified the milestones: v0.9.0, v0.9.1 Feb 22, 2024
@XprobeBot XprobeBot modified the milestones: v0.9.1, v0.9.2, v0.9.3 Mar 1, 2024
@XprobeBot XprobeBot modified the milestones: v0.9.3, v0.9.4, v0.9.5 Mar 15, 2024
@XprobeBot XprobeBot modified the milestones: v0.10.0, v0.10.1 Mar 29, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Apr 1, 2024

Staled, close first.

@qinxuye qinxuye closed this Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants