-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
不同 Raft 集群非预期通信导致 Raft 状态异常 #1012
Comments
|
|
|
killme2008
added a commit
that referenced
this issue
Oct 20, 2023
killme2008
added a commit
that referenced
this issue
Oct 22, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
我们的应用使用 JRaft 作为持久化模块,部署在 K8S 中(Statefulset)。Raft 集群节点之间通过 K8S headless 域名通信。应用存在多租户场景,同一个 K8S 集群中会部署多个应用集群。
当应用的多个集群同时滚动更新时,Pod IP 会变动,可能原先 A 集群中节点 IP,滚动更新后被分配到了 B 集群中的节点。
滚动更新前:
滚动更新后:
更新前 app-cluster-a-0 的 IP 10.10.10.10 在更新后被分配给了 app-cluster-b-0。
由于域名解析存在延迟,包括 JVM、GRPC、K8S CoreDNS 等各个层面,更新后集群 a 中的节点(app-cluster-a-1 或 app-cluster-a-2)通过域名连接 app-cluster-a-0 时,实际可能连接到 app-cluster-b-0 节点。会带来的 2 大问题:
问题与 Issue #683 有些类似,但 PR #690 并不能彻底解决我们面临的问题。
Expected behavior
期望 JRaft 能够解决 DNS 解析延迟带来的问题:
Actual behavior
不同 Raft 集群非预期通信导致 Raft 状态异常
Steps to reproduce
Minimal yet complete reproducer code (or GitHub URL to code)
Environment
java -version
):uname -a
):The text was updated successfully, but these errors were encountered: