Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

NormanBB · 2023-07-08T08:54:10Z

操作系统

Windows

Python 版本

3.11.3

NoneBot 版本

2.0.0

适配器

nonebot-adapter-telegram

协议端

V2

描述问题

在使用 on_command 响应器里，对中文命令与英文命令调用 extract_plain_text 出现了两种不同的结果，其中对于中文命令，函数能获取到命令后紧接的字符，并去除了空白字符，但对于英文命令，函数保留了空白字符。
代码参见截图，结果参见截图。

复现步骤

调用 on_command 命令，使用 extract_plain_text 对文本进行处理。

from nonebot import  on_command
from nonebot.adapters import Message
from nonebot.params import CommandArg

weather = on_command("原")

@weather.handle()
async def handle_function(args: Message = CommandArg()):
    print(args.extract_plain_text())
    if location := args.extract_plain_text():
        await weather.finish(f"raw:{location}")

期望的结果

期望英文命令与中文命令一致，仅获取到非空白字符。

截图或日志

Tasks

Give feedback

No tasks being tracked yet.

Options

The text was updated successfully, but these errors were encountered:

yanyongyu · 2023-07-08T08:57:22Z

@nonebot/telegram

NormanBB · 2023-07-08T09:26:47Z

@j1g5awi

附函数：

    def extract_plain_text(self) -> str:
        """提取消息内纯文本消息"""
        return "".join(str(seg) for seg in self if seg.is_text())

j1g5awi · 2023-07-08T10:28:35Z

此为 TrieRule 仅匹配第一个 MessageSegment，而 Telegram Adapter 中原生的 MessageEntity 被视为不同的 MessageSegment 所致。

/raw 在 Telegram 中为 bot_command 类的 MessageEntity，其后带空格的 hi 则不属于任何 MessageEntity，故 Telegram Adapter 将其视为不同的 MessageSegment。而在 Matcher 匹配过程中， TrieRule 仅对 /raw 所在的 MessageSegment 去除空格，不对 hi 所在的 MessageSegment 做任何处理。
/原 反倒能够正常去除空格的原因是 Telegram 对中文的支持奇差无比，该段文本不被当作 MessageEntity。

~~下个版本我将大幅更改本适配器的 message 处理逻辑以解诀此问题。~~

NoneBot2 的更新解诀了此问题：nonebot/nonebot2#2419

NormanBB · 2023-07-08T10:44:38Z

感谢解释，以及感谢您的贡献。

NormanBB added the bug Something isn't working label Jul 8, 2023

NormanBB changed the title ~~Bug: 出现异常~~ Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 Jul 8, 2023

NormanBB changed the title ~~Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致~~ Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 Jul 8, 2023

j1g5awi transferred this issue from nonebot/nonebot2 Jul 8, 2023

j1g5awi self-assigned this Jul 8, 2023

j1g5awi mentioned this issue Jul 8, 2023

Breaking: Telegram 消息段的设计 #6

Closed

NormanBB closed this as completed Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

NormanBB commented Jul 8, 2023 •

edited

Loading

Tasks

yanyongyu commented Jul 8, 2023

NormanBB commented Jul 8, 2023

j1g5awi commented Jul 8, 2023 •

edited

Loading

NormanBB commented Jul 8, 2023

Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

Bug: message.extract_plain_text() 对中文与英文消息处理得到的纯文本不一致 #47

Comments

NormanBB commented Jul 8, 2023 • edited Loading

操作系统

Python 版本

NoneBot 版本

适配器

协议端

描述问题

复现步骤

期望的结果

截图或日志

Tasks

yanyongyu commented Jul 8, 2023

NormanBB commented Jul 8, 2023

j1g5awi commented Jul 8, 2023 • edited Loading

NormanBB commented Jul 8, 2023

NormanBB commented Jul 8, 2023 •

edited

Loading

j1g5awi commented Jul 8, 2023 •

edited

Loading