Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] python客户端,多 master 无 slave,无压力,概率出现分钟级delay #9145

Open
3 tasks done
eigen2017 opened this issue Jan 17, 2025 · 13 comments
Open
3 tasks done

Comments

@eigen2017
Copy link

eigen2017 commented Jan 17, 2025

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

ubuntu

RocketMQ version

5.3.1

JDK Version

Java8

Describe the Bug

2个master分摊消息,无slave
采用python客户端:https://github.com/apache/rocketmq-clients
topic:normal类型
1个消费者:循环调用simpleConsumer.receive来接收消息
1个生产者:调用producer.send来发送消息。
流量压力:无任何流量,仅手动测试。
master的flushDiskType已经都设成同步。

问题:概率的,生产者生产之消息之后,延迟半分钟到2分钟,消费者才消费到消息,
且在delay期间:消费者调用了receive函数多次无果。dashboard上查topic的consumer状态是delay 1条消息。
也就是说,master收到了消息,消费者一直尝试消费,但是消费者一直没有消费到。

规避:采用1个master,无此问题。
或者采用1个master,1个slave,无此问题。
多master,每master配备slave,有此问题。
多master,无slave,有此问题。
即仅在多个master分摊流量时,有此问题。

Steps to Reproduce

见bug描述

What Did You Expect to See?

mq的含义是实时消费,不管在任何组网下,单发测试都不能有分钟级的延迟。

期待多个master分摊消息,消费者组中消费者的数量任意,在此情况下,依然保持消息能实时消费。

What Did You See Instead?

另希望介绍一下normal类型topic的普通消费机制
猜测本问题,是否是因为:消费者某一时刻仅能监听1个master?如果有多个master,该消费者会在master之间切换队列,导致延迟?
如果是这样,请给出解决方案。

Additional Context

No response

@eigen2017
Copy link
Author

原因基本找到了,python的客户端每次调用recive只会看一个队列,即使该队列是空,也不立即切换队列,而是要等待invisible duration时长,而这个时长最低要求设置10秒,导致极限情况下,2master共16个队列,要160秒才轮询到那条有消息的队列。

@eigen2017
Copy link
Author

python客户端的轮询策略十分的怪异,消费到1条消息立即切换队列,然后没消费到消息就等待一个duration再切。。

@eigen2017
Copy link
Author

Image

delay的4条消息在队列a0123

@eigen2017
Copy link
Author

Image

现在队列只轮询到b4,b5,每次轮询等10秒以上

@eigen2017
Copy link
Author

终于轮询到a0123,才开始消费delay的消息

Image

@eigen2017
Copy link
Author

eigen2017 commented Jan 20, 2025

问题代码位置,轮询:

Image

@eigen2017
Copy link
Author

更正一下,不是受invisible duration影响,而是simple customer对象构造时传入的await duration,实测至少也要5秒,问题依旧在,在队列为空时应该直接切换broker,而不是再轮询一遍已经没消息的broker的其他队列

@eigen2017
Copy link
Author

轮询只需要在master broker之间切换即可,比如有2master,就ab的队列0之间切即可,没必要a01234567再b01234567

@eigen2017
Copy link
Author

因为a0就已经能把a01234567的消息一次全部收到,b0同理

@eigen2017
Copy link
Author

修改python.客户3个类可以搞定此问题

Image

我找时间提一下pr

@eigen2017
Copy link
Author

这样改消费者没有问题,发现生产的时候会只走a0b0这两个队列,不会走a1234567和b1234567,注意判定角色

@lizhimins
Copy link
Member

看下客户端的逻辑是不是在多个 topic 的时候,是轮询每个 topic 导致的延迟。用不同的多个 simple consumer 试试

@qianye1001
Copy link
Contributor

建议使用 simple consumer 时构建本地缓存,将消费与 receive 操作进行异步化,receive后将消息放到本地缓存中后立刻进行下一次receive,可以有效减少消息的端到端延迟时间

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants