Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherry-pick-1.1: storage: add permitLargeSnapshots flag to replica #20906

Commits on Dec 19, 2017

  1. storage: respect context deadline in idAllocator.Allocate

    Release note: None
    nvanbenschoten committed Dec 19, 2017
    Configuration menu
    Copy the full SHA
    6adf047 View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2017

  1. storage: add permitLargeSnapshots flag to replica

    In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788)
    at [preventing large snapshots](cockroachdb#7581)
    can result in replica unavailability. Our current approach to limiting
    large snapshots assumes is that its ok to block snapshots indefinitely
    while waiting for a range to first split. Unfortunately, this can create
    a dependency cycle where a range requires a snapshot to split (because it
    can't achieve an up-to-date quorum without it) but isn't allowed to perform
    a snapshot until its size is reduced below the threshold. This can result
    in unavailability even when a majority of replicas remain live.
    
    Currently, we still need this snapshot size limit because unbounded snapshots
    can result in OOM errors that crash entire nodes. However, once snapshots
    are streamed from disk to disk, never needing to buffer in-memory on the
    sending or receiving side, we should be able to remove any snapshot size
    limit (see cockroachdb#16954).
    
    As a holdover, this change introduces a `permitLargeSnapshots` flag on a
    replica which is set when the replica is too large to snapshot but observes
    splits failing. When set, the flag allows snapshots to ignore the size
    limit until the snapshot goes through and splits are able to succeed
    again.
    
    Release note (bug fix): Fixed a scenario where a range that is too big
    to snapshot can lose availability even with a majority of nodes alive.
    nvanbenschoten committed Dec 20, 2017
    Configuration menu
    Copy the full SHA
    4f44c54 View commit details
    Browse the repository at this point in the history