Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slave does not sync key updates during downtime after restart #2214

Closed
1 of 2 tasks
sam-hn opened this issue Apr 1, 2024 · 8 comments · Fixed by #2221
Closed
1 of 2 tasks

Slave does not sync key updates during downtime after restart #2214

sam-hn opened this issue Apr 1, 2024 · 8 comments · Fixed by #2221
Labels
bug type bug

Comments

@sam-hn
Copy link

sam-hn commented Apr 1, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Kvrocks 2.7.0

Minimal reproduce step

  1. shutdown slave node
  2. update key test1 and create a new key test2 in master node
  3. bring up the slave node and check the keys in slave, test1 in slave is not updated and test2 is created

What did you expect to see?

after restart, slave should have the updated value of test1

What did you see instead?

test1 key in slave is not updated

Anything Else?

I also tried updating key test2 in master node to see if this could trigger the sync of key test1 between slave and master, but only to find test2 is updated in slave

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@sam-hn sam-hn added the bug type bug label Apr 1, 2024
@git-hulk
Copy link
Member

git-hulk commented Apr 1, 2024

@sam-hn Can you check if the replication is correct while restarting via the command info replication. Did you add the slaveof {maseter_ip} {master_port} to your slave configuration file? if no, it won't reconnect the master after restarting.

@sam-hn
Copy link
Author

sam-hn commented Apr 1, 2024

@sam-hn Can you check if the replication is correct while restarting via the command info replication. Did you add the slaveof {maseter_ip} {master_port} to your slave configuration file? if no, it won't reconnect the master after restarting.

thanks for your feedback.
the replication is correct, info replication shows slave is connected, actually slave configuration is not the problem, because slave can sync test2-- the new key from master, it's just test1 key that has problem

@git-hulk
Copy link
Member

git-hulk commented Apr 1, 2024

@sam-hn I cannot reproduce this issue on my side, what I do is:

  1. start S1 and S2, and S2 slaveof S1
  2. SET a 100 to S1, then shutdown the S2
  3. SET a 101 and SET b 1 to S1, then restart S2 and S2 slaveof S1
  4. GET a and GET b from S2, and returns 101 and 1

Can you provide how do you reproduce this?

@sam-hn
Copy link
Author

sam-hn commented Apr 3, 2024

@git-hulk thanks for your test results. This is the exact steps I tried except I was testing with cluster mode.

Then I did another test with two new kvrocks nodes which had cluster mode disabled, and this time I could not reproduce the issue either. After that, I removed db dir and enabled cluster mode on the nodes, called controller API to create new cluster then did the test again, and the issue came back.

So I think the issue only happens with master-slave replication in a cluster. Can you please try the same and see if issue can reproduce this time?

@sam-hn
Copy link
Author

sam-hn commented Apr 3, 2024

Attached my test results for reference, below is the cluster info

10.70.171.222:6672> cluster nodes
qxpiIgvNeJN2DMLgwMwDmud8luIYw6uY7dIEuFtH 10.70.171.222:6673@16673 slave kcf0WSVWymTh6AjTuw1JxL2e4AMNLXgkK9XFSzm8 1712109613610 1712109613611 1 connected
kcf0WSVWymTh6AjTuw1JxL2e4AMNLXgkK9XFSzm8 10.70.171.222:6672@16672 myself,master - 1712109613610 1712109613611 1 connected 0-16383

during slave downtime, update a to 101 and create b

10.70.171.222:6672> info replication
# Replication
role:master
connected_slaves:0
master_repl_offset:3
10.70.171.222:6672> set a 101
OK
10.70.171.222:6672> set b 1
OK

after slave restarted, key a still keeps old value

10.70.171.222:6673> keys *
1) "b"
2) "a"
10.70.171.222:6673> get a
"100"
10.70.171.222:6673> get b
"1"

@git-hulk
Copy link
Member

git-hulk commented Apr 3, 2024

@sam-hn Thank you, I will have a try soon when I get time.

@sam-hn
Copy link
Author

sam-hn commented Apr 3, 2024

slave-config.txt
attached slave configs which has use-rsid-psync enabled

@git-hulk
Copy link
Member

git-hulk commented Apr 3, 2024

slave-config.txt attached slave configs which has use-rsid-psync enabled

@sam-hn Thanks for your input. I found the root cause now and will submit a PR to resolve this soon.

git-hulk added a commit to git-hulk/kvrocks that referenced this issue Apr 4, 2024
git-hulk added a commit that referenced this issue Apr 5, 2024
This closes #2214

The namespace mechanism is NOT allowed in cluster mode, so it's
unnecessary to rewrite while the cluster mode is enabled. This
config rewrite behavior will cause the replication issue 
mentioned in #2214 when starting the cluster node.

The root cause is that the server will try to rewrite the namespace
into the rocksdb if the option `repl-namespace-enabled` is enabled.
So it will increase the server's rocksdb sequence and replication will
start with the wrong offset. We have checked if the role is a slave 
before rewriting, but the cluster replication is NOT set at that time(master-replica is good).

The good news is it only affects the cluster users who enabled
the option `repl-namespace-enabled`, so I guess almost no user
will do this since the namespace replication is meaningless to the cluster mode.

```
=== RUN   TestClusterReplication/Cluster_replication_should_work_normally_after_restart
    replication_test.go:88: 
        	Error Trace:	/Users/hulk/code/cxx/kvrocks/tests/gocase/integration/replication/replication_test.go:88
        	Error:      	Not equal: 
        	            	expected: "v1"
        	            	actual  : "v0"
```

And it works well after this patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug type bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants