Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slowlog crash when redirection fails #129

Merged
merged 5 commits into from
Sep 11, 2017

Conversation

doyoubi
Copy link
Contributor

@doyoubi doyoubi commented Sep 4, 2017

Corvus may crash when it fails to connect to redis and try to create a sub-slowlog for multiple key command.

The Buggy Code

struct slowlog_entry *slowlog_create_sub_entry(struct command *cmd, int64_t total_latency)
{
    ...
    int64_t max_remote_latency = 0;
    struct command *c, *slowest_sub_cmd = NULL;
    STAILQ_FOREACH(c, &cmd->sub_cmds, sub_cmd_next) {
        // When this command fails because of not being able to connect to redis,
        // `remote_latency` will be zero (or maybe nagative) which result in a NULL `slowest_sub_cmd`.
        int64_t remote_latency = c->rep_time[1] - c->rep_time[0];
        if (remote_latency > max_remote_latency) {
            max_remote_latency = remote_latency;
            slowest_sub_cmd = c;
        }
    }
   // NULL `slowest_sub_cmd` will create a crash here.
    return slowlog_create_entry(slowest_sub_cmd, max_remote_latency / 1000, total_latency);
}

How To Reproduce

(1) Use the following Python 3 script to mock a redis cluster and let corvus connect to it.
Note that the slowlog-log-slower-than should be 0 and slowlog-max-len should be positive.

import time
import socket

server_address = ('localhost', 10000)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(server_address)
sock.listen(1)

CLUSTER_NODES = b'*2\r\n$7\r\nCLUSTER\r\n$5\r\nNODES\r\n'
CLUSTER_NODES_RESP = 'ffffffffffffffffffff00000000490000000265 127.0.0.1:10000 myself,master - 0 0 1 connected 0-16383\n'
RESP = '${}\r\n{}\r\n'.format(len(CLUSTER_NODES_RESP), CLUSTER_NODES_RESP)

while True:
    connection, client_address = sock.accept()
    data = connection.recv(1000)
    if data == CLUSTER_NODES:
        connection.sendall(RESP.encode('utf-8'))
        raise Exception('Let it crash')

(2) Send a MGET SomeKey to corvus and trigger a crash.

Copy link
Contributor

@tevino tevino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A test case would be great.

@doyoubi doyoubi force-pushed the fix/SlowlogCrash branch 5 times, most recently from 05cfcf1 to 6c2bd6f Compare September 11, 2017 02:44
@doyoubi doyoubi merged commit 59d1bcb into eleme:master Sep 11, 2017
@doyoubi doyoubi mentioned this pull request Sep 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants