vrrp heartbeat blocked when harddisk hangout #2494

xxxmailk · 2024-11-05T01:37:12Z

Hi all.

Describe the bug

We have a set of environments that are built in kvm virtual machines with ceph blocks as the back-end storage, and when ceph failover occurs the VM IO will pause, and at this time the keepalived vrrp heartbeat will pause for the same amount of time, and the slave nodes will not be able to receive the master node's heartbeat, thus leading to incorrect failover

To Reproduce

hanout disk io, but keep network running， all the slave node will change them to master

Expected behavior

keep vrrp heartbeat when disk io block, because network is running

Keepalived version

1.3.5

Root cause

We use Prometheus-keepalived-exporter to monitor the keepalived status, this program tells the keepalived to generate the /tmp/keepalived.stats file every minute, and the exporter collects the keepalived information by reading this file
This seems to be fine.
But when hard disk io block, strace shows that the keepalived is stuck at open/read/write /tmp/keepalived.stats, and the vrrp heartbeat has stopped.

xxxmailk · 2024-11-05T01:45:29Z

Maybe we can change the /tmp/keepalived.stats file path into tempfs directory? I don't think it needs to be persisted to the hard disk

pqarmitage · 2024-11-05T19:00:10Z

I can certainly add a configurable path for the files to be written to; of course on most systems nowadays /tmp is a tmpfs. A better way, and the way keepalived was designed to work, is to obtain stats via snmp.

A further question is what other files are opened and read from/written to in the main vrrp or checker processes (there are none in the bfd process). Reading the configuration for a reload is handled by the parent process, and it doesn't matter if that is blocked for a while; the configuration is passed to the child process via a memfd so that shouldn't block.

What comes to mind are track_file in the VRRP process and CHECK_FILE in the checkers process. There are also the outputs caused by SIGUSR1 and SIGUSR2, as has been identified above. There is also the option to write log entries directly to a file rather than via syslog, but that is only available if a compile time option is specified, and is really only for debugging.

This has triggered some further, and completely unrelated, thinking. The track_file and CHECK_FILE implementations assume that there is an atomic update of the relevant file. If 1 were written to the file, there was then a flush following the write, and sometime later 0\n was written, then I think keepalived would initially process a value of 1, and subsequently the value 10. We probably need not to process the file contents until the \n has been written/read. There still could be a problem if the file previously contained 20\n, and then the first byte was updated to 0 followed by a delay before \n was written. These are really notes for me to check what happens in practice.

Another thought is that when reading and writing files in the critical timing path, it might be that we should use io_uring (liburing). That leaves the problem of open(), and the stat() family if we use it (I think close() should not block). https://nullprogram.com/blog/2020/09/04/ is very interesting in this respect, and I will explore it further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vrrp heartbeat blocked when harddisk hangout #2494

vrrp heartbeat blocked when harddisk hangout #2494

xxxmailk commented Nov 5, 2024 •

edited

Loading

xxxmailk commented Nov 5, 2024 •

edited

Loading

pqarmitage commented Nov 5, 2024 •

edited

Loading

vrrp heartbeat blocked when harddisk hangout #2494

vrrp heartbeat blocked when harddisk hangout #2494

Comments

xxxmailk commented Nov 5, 2024 • edited Loading

xxxmailk commented Nov 5, 2024 • edited Loading

pqarmitage commented Nov 5, 2024 • edited Loading

xxxmailk commented Nov 5, 2024 •

edited

Loading

xxxmailk commented Nov 5, 2024 •

edited

Loading

pqarmitage commented Nov 5, 2024 •

edited

Loading