Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2个从节点S1和S2,S1完成全量同步,S2全量同步执行一半,主从切换,将S1提升新主,S2切新主后,不再进行全量同步 #2436

Closed
chengyu-l opened this issue Mar 1, 2024 · 4 comments · Fixed by #2766
Labels
☢️ Bug Something isn't working

Comments

@chengyu-l
Copy link
Contributor

chengyu-l commented Mar 1, 2024

Is this a regression?

Yes

Description

2个从节点S1和S2,S1完成全量同步,S2全量同步执行一半,主从切换,将S1提升新主,S2切新主后,不再进行全量同步。

20240614更新:
1 这个case是否存在还存疑,需要再验证,因为即使replicationID相同,offset也不应该可以对的上
2 但有一个关联问题确实存在:
如果一主一从,从节点全量同步到一半的时候主挂了,这个时候从节点内部没有数据(或者都是脏数据)是没有资格提升为主的。 预计提供一个对外的标志位放在info命令的返回值中来告知该节点是否在全量同步过程中出现过意外

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

No response

Please provide the version you discovered this bug in (check about page for version information)

No response

Anything else?

No response

@chengyu-l chengyu-l added the ☢️ Bug Something isn't working label Mar 1, 2024
@chengyu-l
Copy link
Contributor Author

问题原因是:S1和S2都已经在pika配置文件中设置了 replication_id,S2 在切新主后,replication_id 都相同,slave在MetaSync时,force_full_sync_ 不会被设置为 true,此时不会进入 TryDBSync 逻辑。S2 接下来会进入TrySync逻辑,新主S1在收到TrySync请求后,只会判断binlog文件是否已存在,如果存在,就不进行全量同步。
修改方法:新主S1在收到TrySync请求后,除了判断binlog文件是否已存在,还需要再判断一下 filenum 和 offset 是否都为0 ,如果都为0,则设置响应状态为 kSyncPointBePurged,要求 S2 发起全量同步请求。

chengyu-l pushed a commit to chengyu-l/pika that referenced this issue Mar 4, 2024
chengyu-l pushed a commit to chengyu-l/pika that referenced this issue Mar 4, 2024
AlexStocks pushed a commit that referenced this issue Mar 6, 2024
…d in its config. this must execute full sync(#2436) (#2444)

Co-authored-by: liuchengyu <liuchengyu@360.cn>
chengyu-l pushed a commit to chengyu-l/pika that referenced this issue Mar 7, 2024
…cation_id in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)"

This reverts commit ab9ed71.
AlexStocks pushed a commit that referenced this issue Mar 7, 2024
…cation_id in its config. this must execute full sync(#2436) (#2444)" (#2460)

This reverts commit ab9ed71.

Co-authored-by: liuchengyu <liuchengyu@360.cn>
@chengyu-l
Copy link
Contributor Author

遇到该问题,需要运维人员介入。由运维与业务协商。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


When encountering this problem, operation and maintenance personnel need to intervene. It is negotiated between operation and maintenance and business.

luky116 pushed a commit to luky116/pika that referenced this issue Mar 14, 2024
…d in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)

Co-authored-by: liuchengyu <liuchengyu@360.cn>
bigdaronlee163 pushed a commit to bigdaronlee163/pika that referenced this issue Jun 8, 2024
…d in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)

Co-authored-by: liuchengyu <liuchengyu@360.cn>
bigdaronlee163 pushed a commit to bigdaronlee163/pika that referenced this issue Jun 8, 2024
…cation_id in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)" (OpenAtomFoundation#2460)

This reverts commit ab9ed71.

Co-authored-by: liuchengyu <liuchengyu@360.cn>
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Update: It is still doubtful whether this case exists and needs to be verified again.
But there is a correlation problem that does exist: if there is one master and one slave, the slave node is fully synchronized and the master hangs up halfway. At this time, there is no data (or all dirty data) in the slave node and it is not eligible to be promoted to the master. It is expected to provide an external flag bit in the return value of the info command to inform the node whether there has been an accident during the full synchronization process.

@cheniujh cheniujh reopened this Jun 28, 2024
cheniujh pushed a commit to cheniujh/pika that referenced this issue Sep 24, 2024
…d in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)

Co-authored-by: liuchengyu <liuchengyu@360.cn>
cheniujh pushed a commit to cheniujh/pika that referenced this issue Sep 24, 2024
…cation_id in its config. this must execute full sync(OpenAtomFoundation#2436) (OpenAtomFoundation#2444)" (OpenAtomFoundation#2460)

This reverts commit 8693a76.

Co-authored-by: liuchengyu <liuchengyu@360.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
☢️ Bug Something isn't working
Projects
None yet
4 participants