Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,rpc/nodedialer: improve distsql node health checks #30987

Merged
merged 2 commits into from
Oct 5, 2018

Commits on Oct 4, 2018

  1. rpc/nodedialer: reset conn breaker after succesful connection

    `Dialer.DialInternalClient` does not check the circuit breaker but
    blindly attempts a connection and can succeed, leaving the system in a
    state where there is a healthy connection to a node, but the circuit
    breaker used for dialing is open. DistSQL checks for connection health
    when scheduling processors, but the connection health check does not
    examine the breaker. So DistSQL will proceed to schedule a processor on
    a node but then be unable to use the connection to that node because
    `Dialer.Dial` will return with a `breaker open` error. The code contains
    a TODO to reconcile the handling of circuit breakers in the various
    `Dialer` methods, but changing the handling is risky in the short
    term. As a stop-gap, we reset the breaker after a connection is
    successfully opened.
    
    Fixes cockroachdb#29149
    
    Release note: None
    petermattis committed Oct 4, 2018
    Configuration menu
    Copy the full SHA
    2e634d7 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2018

  1. sql: consider conn circuit breakers in distsql planning

    Change `DistSQLPlanner.checkNodeHealth` so that it uses
    `nodedialer.Dialer.ConnHealth` instead of `rpc.Context.ConnHealth`. The
    former is the right method to be calling to check a node's connection
    health.
    
    Refactor `DistSQLPlanner.checkNodeHealth` into a `distSQLNodeHealth`
    struct. This removed the need for `DistSQLPlannerTestingKnobs`.
    
    Enhance `nodedialer.Dialer.ConnHealth` to mark connections as unhealthy
    if the circuit breaker is open. This prevents DistSQL from planning
    processors on such nodes.
    
    Release note: None
    petermattis committed Oct 5, 2018
    Configuration menu
    Copy the full SHA
    569aa8e View commit details
    Browse the repository at this point in the history