-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeed lag reached more than 10min when inject network partition betweent pdleader and pdfollowers #9229
Comments
/remove-area dm |
/severity major |
/assign @asddongmen |
@nongfushanquan: GitHub didn't allow me to assign the following users: asddongmen. Note that only pingcap members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
inject network partition between ticdc owner and all other pods,ticdc restart ticdc logs: |
@asddongmen will see whether it can be addressed by etcd-io/etcd#17465 (comment). If not, then I suggest we address it in long term. |
After the merge of #10881, the checkpointTs lag during pd-leader-io-hang cases was reduced to less than 120s, meeting the requirement. |
@Lily2025: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-type bug |
closed |
What did you do?
1、run tpcc with threads 10 and warehouse 1000
2、After 10 minutes, simulates pd leader is network isolated from all pd followers
fault start time:2023-06-13 09:01:47
3、After 10 minutes, recovery the fault
fault recover time:2023-06-13 09:11:48
What did you expect to see?
lag is less than 30s
What did you see instead?
ticdc lag reached more than 10min after inject fault
pd leader changed normally
Versions of the cluster
git hash : 1e2f277
current status of DM cluster (execute
query-status <task-name>
in dmctl)No response
The text was updated successfully, but these errors were encountered: