Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
55560: backupccl: avoid div-by-zero crash on failed node count r=dt a=dt

We've seen a report of a node that crashed due to a divide-by-zero
hit during metrics collection, specifically when computing the
throughput-per-node by dividing the backup size by node count.

Since this is only now used for that metric, make a failure to count
nodes a warning only for release builds (and fallback to 1), and make
any error while counting, or not counting to more than 0, a returned
error in non-release builds.

Release note (bug fix): avoid crashing when BACKUP is unable to count the total nodes in the cluster.

55809: roachtest: fix decommission/randomize r=irfansharif a=tbg

The test could end up using fully decommissioned nodes for cli commands,
which does not work as of #55286.

Fixes #55581.

Release note: None


56019: lexbase: pin `reserved_keywords.go` within Bazel r=irfansharif a=irfansharif

It's an auto-generated file that bazel doesn't yet know how to construct
within the sandbox. Before this PR `make bazel-generate` would show
spurious diffs on a clean checkout without this file present. Now it
no longer will.

Unfortunately it also means that successful bazel builds require
`reserved_keywords.go` being pre-generated ahead of time (it's not
checked-in into the repo). Once Bazel is taught to generate this file
however, this will no longer be the case. It was just something that I
missed in #55687.

Release note: None

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
  • Loading branch information
4 people committed Oct 27, 2020
4 parents b319a0b + 63e79f3 + a00ffe5 + 71550a6 commit bea5339
Show file tree
Hide file tree
Showing 4 changed files with 157 additions and 71 deletions.
17 changes: 15 additions & 2 deletions pkg/ccl/backupccl/backup_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"strings"
"time"

"github.com/cockroachdb/cockroach/pkg/build"
"github.com/cockroachdb/cockroach/pkg/gossip"
"github.com/cockroachdb/cockroach/pkg/jobs"
"github.com/cockroachdb/cockroach/pkg/jobs/jobspb"
Expand Down Expand Up @@ -154,12 +155,20 @@ func clusterNodeCount(gw gossip.OptionalGossip) (int, error) {
return 0, err
}
var nodes int
_ = g.IterateInfos(
err = g.IterateInfos(
gossip.KeyNodeIDPrefix, func(_ string, _ gossip.Info) error {
nodes++
return nil
},
)
if err != nil {
return 0, err
}
// If we somehow got 0 and return it, a caller may panic if they divide by
// such a nonsensical nodecount.
if nodes == 0 {
return 1, errors.New("failed to count nodes")
}
return nodes, nil
}

Expand Down Expand Up @@ -490,7 +499,11 @@ func (b *backupResumer) Resume(

numClusterNodes, err := clusterNodeCount(p.ExecCfg().Gossip)
if err != nil {
return err
if !build.IsRelease() {
return err
}
log.Warningf(ctx, "unable to determine cluster node count: %v", err)
numClusterNodes = 1
}

statsCache := p.ExecCfg().TableStatsCache
Expand Down
7 changes: 6 additions & 1 deletion pkg/ccl/backupccl/restore_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"fmt"
"math"

"github.com/cockroachdb/cockroach/pkg/build"
"github.com/cockroachdb/cockroach/pkg/ccl/storageccl"
"github.com/cockroachdb/cockroach/pkg/clusterversion"
"github.com/cockroachdb/cockroach/pkg/jobs"
Expand Down Expand Up @@ -1176,7 +1177,11 @@ func (r *restoreResumer) Resume(

numClusterNodes, err := clusterNodeCount(p.ExecCfg().Gossip)
if err != nil {
return err
if !build.IsRelease() {
return err
}
log.Warningf(ctx, "unable to determine cluster node count: %v", err)
numClusterNodes = 1
}

for _, tenant := range details.Tenants {
Expand Down
Loading

0 comments on commit bea5339

Please sign in to comment.