Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ssl state check failed #766

Closed
wants to merge 1 commit into from

Conversation

cdjingit
Copy link
Contributor

@cdjingit cdjingit commented May 15, 2019

No description provided.

@cdjingit
Copy link
Contributor Author

#658

@cdjingit
Copy link
Contributor Author

@jamesge 帮忙看下这个fix

@jamesge
Copy link
Contributor

jamesge commented May 23, 2019

@cdjingit 这个改法我觉得解决的不是root cause

@old-bear
Copy link
Contributor

看了下代码,有可能触发路径是这样的:

  1. socket在成功connect后,后台启动KeepWrite准备往里写:

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L1573-L1580

  1. 此时socket被SetFailedSetFailed里好像并不会管上面那个后台启动的线程
  2. 触发HealthCheck,然后一路调用把_ssl_state重置:

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L705

  1. 这时KeepWrite被调度开始运行,KeepWrite里并不会尝试Address,最后一路到DoWrite

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L1690

本地暂时没环境,还没尝试复现
这个问题在server端不会遇到,因为没开healthcheck
如果要临时修的话,可以按这个PR这么改。后面DoWrite继续尝试写,发现fd关了,最后返回-1

@old-bear
Copy link
Contributor

看了下代码,有可能触发路径是这样的:

  1. socket在成功connect后,后台启动KeepWrite准备往里写:

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L1573-L1580

  1. 此时socket被SetFailedSetFailed里好像并不会管上面那个后台启动的线程
  2. 触发HealthCheck,然后一路调用把_ssl_state重置:

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L705

  1. 这时KeepWrite被调度开始运行,KeepWrite里并不会尝试Address,最后一路到DoWrite

https://github.com/apache/incubator-brpc/blob/8fe8640e2f95a0e508f3070b086b7b25e84f5c88/src/brpc/socket.cpp#L1690

本地暂时没环境,还没尝试复现
这个问题在server端不会遇到,因为没开healthcheck
如果要临时修的话,可以按这个PR这么改。后面DoWrite继续尝试写,发现fd关了,最后返回-1

看错了,这个路径不可能,第1步里会占一个ref给KeepWrite
剩下可能是SocketMap里的ref被remove后,触发后续2-4

@cdjingit
Copy link
Contributor Author

@jamesge @old-bear 这个问题之前有个fix相似,就是启动health check前,认为已经没有socket write了,否则socket引用计数不是2(通过WaitAndReset(2))。但有一种可能是socket的从naming service删除了(socket 引用计数减一),这时如果socket上还有数据在write,引用计数也是2,WaitAndReseet成功并且重置了socket相关状态。严格来讲,是有问题的。

@GardianT
Copy link

GardianT commented Jun 4, 2019

问下这个fix是ok的么?预计什么时候会merge?

@wwbmmm
Copy link
Contributor

wwbmmm commented Aug 9, 2022

由于#1814 已解决此问题,所以这个RP close了

@wwbmmm wwbmmm closed this Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants