Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

Closed
NormanBB opened this issue Jul 8, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@NormanBB
Copy link

NormanBB commented Jul 8, 2023

操作系统

Windows

Python 版本

3.11.3

NoneBot 版本

2.0.0

适配器

nonebot-adapter-telegram

协议端

V2

描述问题

在使用 on_command 响应器里,对中文命令与英文命令调用 extract_plain_text 出现了两种不同的结果,其中对于中文命令,函数能获取到命令后紧接的字符,并去除了空白字符,但对于英文命令,函数保留了空白字符。
代码参见截图,结果参见截图。

复现步骤

  1. 调用 on_command 命令,使用 extract_plain_text 对文本进行处理。
from nonebot import  on_command
from nonebot.adapters import Message
from nonebot.params import CommandArg

weather = on_command("原")

@weather.handle()
async def handle_function(args: Message = CommandArg()):
    print(args.extract_plain_text())
    if location := args.extract_plain_text():
        await weather.finish(f"raw:{location}")

期望的结果

期望英文命令与中文命令一致,仅获取到非空白字符。

截图或日志

image

image

Tasks

Preview Give feedback
No tasks being tracked yet.
@NormanBB NormanBB added the bug Something isn't working label Jul 8, 2023
@NormanBB NormanBB changed the title Bug: 出现异常 Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 Jul 8, 2023
@NormanBB NormanBB changed the title Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 Jul 8, 2023
@yanyongyu
Copy link
Member

@nonebot/telegram

@j1g5awi j1g5awi transferred this issue from nonebot/nonebot2 Jul 8, 2023
@j1g5awi j1g5awi self-assigned this Jul 8, 2023
@NormanBB
Copy link
Author

NormanBB commented Jul 8, 2023

@j1g5awi

附函数:

    def extract_plain_text(self) -> str:
        """提取消息内纯文本消息"""
        return "".join(str(seg) for seg in self if seg.is_text())

@j1g5awi
Copy link
Member

j1g5awi commented Jul 8, 2023

此为 TrieRule 仅匹配第一个 MessageSegment,而 Telegram Adapter 中原生的 MessageEntity 被视为不同的 MessageSegment 所致。

/raw 在 Telegram 中为 bot_command 类的 MessageEntity,其后带空格的 hi 则不属于任何 MessageEntity,故 Telegram Adapter 将其视为不同的 MessageSegment。而在 Matcher 匹配过程中, TrieRule 仅对 /raw 所在的 MessageSegment 去除空格,不对 hi 所在的 MessageSegment 做任何处理。
/原 反倒能够正常去除空格的原因是 Telegram 对中文的支持奇差无比,该段文本不被当作 MessageEntity。

下个版本我将大幅更改本适配器的 message 处理逻辑以解诀此问题。

NoneBot2 的更新解诀了此问题:nonebot/nonebot2#2419

@NormanBB
Copy link
Author

NormanBB commented Jul 8, 2023

感谢解释,以及感谢您的贡献。

@NormanBB NormanBB closed this as completed Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants