Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug that healthy sentinel displays ERROR on the codis-fe #1730

Open
wants to merge 3 commits into
base: release3.2
Choose a base branch
from

Conversation

yz1509
Copy link

@yz1509 yz1509 commented Feb 9, 2021

现象

类似 #1345,查看 dashboard 状态,发现错误位于 InfomasterCommand 解析回包失败处:

错误信息在这两处切换变化

Inked001_LI
Inked002_LI

查看 sentinel 客户端列表发现,dashboard 和 sentinel 之间的连接一直未关闭,不符合代码中出现 error 就 close 连接的逻辑:

# age=1247007
id=3499285 addr=127.0.0.227:43720 fd=53 name= age=1247007 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info

通过抓取 sentinel 收到的 Redis 请求发现显示 ERROR 的 sentinel 意外收到了两次 INFO 命令:

正常运行情况下应该是 INFO 后接 SENTINEL masters

1612775925.970175 [127.0.0.227:43720]  INFO
1612775925.970205 [127.0.0.227:43720]  SENTINEL masters
1612775926.970092 [127.0.0.227:43720]  INFO
1612775927.966160 [127.0.0.227:43720]  INFO
1612775927.966172 [127.0.0.227:43720]  SENTINEL masters
1612775928.978100 [127.0.0.227:43720]  INFO
1612775929.978165 [127.0.0.227:43720]  INFO
1612775929.978183 [127.0.0.227:43720]  SENTINEL masters
1612775930.986163 [127.0.0.227:43720]  INFO
1612775931.978156 [127.0.0.227:43720]  INFO
1612775931.978178 [127.0.0.227:43720]  SENTINEL masters
1612775932.982088 [127.0.0.227:43720]  INFO
1612775933.998090 [127.0.0.227:43720]  INFO
1612775933.998105 [127.0.0.227:43720]  SENTINEL masters
1612775934.998154 [127.0.0.227:43720]  INFO
1612775935.998092 [127.0.0.227:43720]  INFO
1612775935.998111 [127.0.0.227:43720]  SENTINEL masters
1612775937.  6160 [127.0.0.227:43720]  INFO
1612775938.  6160 [127.0.0.227:43720]  INFO
1612775938.  6173 [127.0.0.227:43720]  SENTINEL masters
1612775938.998154 [127.0.0.227:43720]  INFO
1612775940.  6157 [127.0.0.227:43720]  INFO
1612775940.  6180 [127.0.0.227:43720]  SENTINEL masters
1612775941. 14246 [127.0.0.227:43720]  INFO
1612775942.  6342 [127.0.0.227:43720]  INFO
1612775942.  6349 [127.0.0.227:43720]  SENTINEL masters
1612775943. 22168 [127.0.0.227:43720]  INFO
1612775944. 22161 [127.0.0.227:43720]  INFO
1612775944. 22178 [127.0.0.227:43720]  SENTINEL masters
1612775945. 18160 [127.0.0.227:43720]  INFO
1612775946. 22093 [127.0.0.227:43720]  INFO
1612775946. 22118 [127.0.0.227:43720]  SENTINEL masters
1612775947. 34166 [127.0.0.227:43720]  INFO
1612775948. 34163 [127.0.0.227:43720]  INFO
1612775948. 34183 [127.0.0.227:43720]  SENTINEL masters
1612775949. 46191 [127.0.0.227:43720]  INFO
1612775950. 42159 [127.0.0.227:43720]  INFO
1612775950. 42172 [127.0.0.227:43720]  SENTINEL masters

长时间观察发现 dashboard 所维护的 redis 客户端的 Recv-Q 一直不为 0:

tcp     1276      0 127.0.0.227:43720   127.0.0.63:20379      ESTABLISHED 55618/xxx_xxx_xxx

经过 tcpdump 抓包发现 Redis 客户端和 Sentinel 交互没有问题,问题出在 dashboard 从 TCP 缓冲区拉取数据时每次恰好都是拿到上一条请求的回包,从而导致解析回包错误:

Round Dashboard 请求 Dashboard 拉取的 Sentinel 回包 说明
a INFO SENTINEL masters 解析错误
a + 1 INFO INFO 拉取的其实是上个 INFO 请求的回包
SENTINEL masters INFO 解析错误
a + 2 INFO SENTINEL masters 解析错误
a + 3 INFO INFO 拉取的其实是上个 INFO 请求的回包
SENTINEL masters INFO 解析错误
a + 4 INFO SENTINEL masters 解析错误
... ... ... ...

发生此现象的原因猜测

newRedisStats 某次超时时,并未关闭该客户端,而是利用 defer 将未从输入缓冲区拉取请求回包的 redis 客户端添加至连接池导致的。

@hiqsociety
Copy link

can you push your request here?
https://github.com/cloudxaas/codis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants