Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: v19.2.2: nil pointer in interval_btree #51913

Closed
cockroach-teamcity opened this issue Jul 27, 2020 · 2 comments
Closed

storage: v19.2.2: nil pointer in interval_btree #51913

cockroach-teamcity opened this issue Jul 27, 2020 · 2 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-teamcity
Copy link
Member

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/1809303514/?referrer=webhooks_plugin

Panic message:

panic.go:522: runtime.errorString: runtime error: invalid memory address or nil pointer dereference

Stacktrace (expand for inline code snippets):

/usr/local/go/src/runtime/panic.go#L521-L523 in runtime.gopanic

// again due to undefined state.
panic(r)
}
in pkg/storage.(*Store).Send.func1
/usr/local/go/src/runtime/panic.go#L521-L523 in runtime.gopanic
/usr/local/go/src/runtime/panic.go#L81-L83 in runtime.panicmem
/usr/local/go/src/runtime/signal_unix.go#L389-L391 in runtime.sigpanic
}
if n.children[i].count >= maxLatches {
splitLa, splitNode := mut(&n.children[i]).split(maxLatches / 2)
in pkg/storage/spanlatch.(*node).insert
}
if replaced, _ := mut(&t.root).insert(la); !replaced {
t.length++
in pkg/storage/spanlatch.(*btree).Set
// Add writes directly to the write tree.
sm.trees[spanset.SpanReadWrite].Set(latch)
default:
in pkg/storage/spanlatch.(*Manager).insertLocked
snap := m.snapshotLocked(spans)
m.insertLocked(lg)
m.mu.Unlock()
in pkg/storage/spanlatch.(*Manager).sequence
) (*Guard, error) {
lg, snap := m.sequence(spans, ts)
defer snap.close()
in pkg/storage/spanlatch.(*Manager).Acquire
var err error
lg, err = r.latchMgr.Acquire(ctx, spans, ba.Timestamp)
if err != nil {
in pkg/storage.(*Replica).beginCmds
var err error
ec, err = r.beginCmds(ctx, ba, spans)
if err != nil {
in pkg/storage.(*Replica).executeWriteBatch
log.Event(ctx, "read-write path")
br, pErr = r.executeWriteBatch(ctx, &ba)
} else if isReadOnly {
in pkg/storage.(*Replica).sendWithRangeID
) (*roachpb.BatchResponse, *roachpb.Error) {
return r.sendWithRangeID(ctx, r.RangeID, ba)
}
in pkg/storage.(*Replica).Send
}
br, pErr = repl.Send(ctx, ba)
if pErr == nil {
in pkg/storage.(*Store).Send
br, pErr := store.Send(ctx, ba)
if br != nil && br.Error != nil {
in pkg/storage.(*Stores).Send
var pErr *roachpb.Error
br, pErr = n.stores.Send(ctx, *args)
if pErr != nil {
in pkg/server.(*Node).batchInternal.func1
return f(ctx)
}
in pkg/util/stop.(*Stopper).RunTaskWithErr
var br *roachpb.BatchResponse
if err := n.stopper.RunTaskWithErr(ctx, "node.Node: batch", func(ctx context.Context) error {
var finishSpan func(*roachpb.BatchResponse)
in pkg/server.(*Node).batchInternal
br, err := n.batchInternal(ctx, args)
in pkg/server.(*Node).Batch
) (*roachpb.BatchResponse, error) {
return a.InternalServer.Batch(ctx, ba)
}
in pkg/rpc.internalClientAdapter.Batch
}
reply, err := iface.Batch(ctx, &ba)
// If we queried a remote node, perform extra validation and
in pkg/kv.(*grpcTransport).sendBatch
ba.Replica = client.replica
reply, err := gt.sendBatch(ctx, client.replica.NodeID, iface, ba)
in pkg/kv.(*grpcTransport).SendNext
}
br, err := transport.SendNext(ctx, ba)
// maxSeenLeaseSequence tracks the maximum LeaseSequence seen in a
in pkg/kv.(*DistSender).sendToReplicas
return ds.sendToReplicas(
ctx,
in pkg/kv.(*DistSender).sendRPC
class := rpc.ConnectionClassForKey(desc.RSpan().Key)
br, err := ds.sendRPC(ctx, ba, class, desc.RangeID, replicas, cachedLeaseHolder, withCommit)
if err != nil {
in pkg/kv.(*DistSender).sendSingleRange
reply, pErr = ds.sendSingleRange(ctx, ba, desc, withCommit)
in pkg/kv.(*DistSender).sendPartialBatch
ds.metrics.AsyncSentCount.Inc(1)
responseCh <- ds.sendPartialBatch(
ctx, ba, rs, desc, evictToken, withCommit, batchIdx, true, /* needsTruncate */
in pkg/kv.(*DistSender).sendPartialBatchAsync.func1
f(ctx)
}()
in pkg/util/stop.(*Stopper).RunLimitedAsyncTask.func1

/usr/local/go/src/runtime/panic.go in runtime.gopanic at line 522
pkg/storage/store_send.go in pkg/storage.(*Store).Send.func1 at line 112
/usr/local/go/src/runtime/panic.go in runtime.gopanic at line 522
/usr/local/go/src/runtime/panic.go in runtime.panicmem at line 82
/usr/local/go/src/runtime/signal_unix.go in runtime.sigpanic at line 390
pkg/storage/spanlatch/interval_btree.go in pkg/storage/spanlatch.(*node).insert at line 381
pkg/storage/spanlatch/interval_btree.go in pkg/storage/spanlatch.(*btree).Set at line 703
pkg/storage/spanlatch/manager.go in pkg/storage/spanlatch.(*Manager).insertLocked at line 290
pkg/storage/spanlatch/manager.go in pkg/storage/spanlatch.(*Manager).sequence at line 224
pkg/storage/spanlatch/manager.go in pkg/storage/spanlatch.(*Manager).Acquire at line 204
pkg/storage/replica.go in pkg/storage.(*Replica).beginCmds at line 1263
pkg/storage/replica_write.go in pkg/storage.(*Replica).executeWriteBatch at line 90
pkg/storage/replica.go in pkg/storage.(*Replica).sendWithRangeID at line 577
pkg/storage/replica.go in pkg/storage.(*Replica).Send at line 525
pkg/storage/store_send.go in pkg/storage.(*Store).Send at line 228
pkg/storage/stores.go in pkg/storage.(*Stores).Send at line 181
pkg/server/node.go in pkg/server.(*Node).batchInternal.func1 at line 925
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunTaskWithErr at line 300
pkg/server/node.go in pkg/server.(*Node).batchInternal at line 913
pkg/server/node.go in pkg/server.(*Node).Batch at line 951
pkg/rpc/context.go in pkg/rpc.internalClientAdapter.Batch at line 542
pkg/kv/transport.go in pkg/kv.(*grpcTransport).sendBatch at line 199
pkg/kv/transport.go in pkg/kv.(*grpcTransport).SendNext at line 168
pkg/kv/dist_sender.go in pkg/kv.(*DistSender).sendToReplicas at line 1676
pkg/kv/dist_sender.go in pkg/kv.(*DistSender).sendRPC at line 436
pkg/kv/dist_sender.go in pkg/kv.(*DistSender).sendSingleRange at line 518
pkg/kv/dist_sender.go in pkg/kv.(*DistSender).sendPartialBatch at line 1447
pkg/kv/dist_sender.go in pkg/kv.(*DistSender).sendPartialBatchAsync.func1 at line 1365
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunLimitedAsyncTask.func1 at line 381
Tag Value
Cockroach Release v19.2.2
Cockroach SHA: 3cbd056
Platform linux amd64
Distribution CCL
Environment v19.2.2
Command server
Go Version go1.12.12
# of CPUs 32
# of Goroutines 386
@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Jul 27, 2020
@yuzefovich yuzefovich changed the title sentry: panic.go:522: runtime.errorString: runtime error: invalid memory address or nil pointer dereference storage: v19.2.2: nil pointer in interval_btree Jul 27, 2020
@yuzefovich
Copy link
Member

cc @nvanbenschoten

@nvanbenschoten
Copy link
Member

This is coming from:

if n.children[i].count >= maxLatches {

We already dereferenced n earlier in the function and this isn't a slice bounds error, so we must be failing to deference n.children[i]. This could be due to one of the following two reasons:

  1. a leaf node lost its leaf flag and is being incorrectly considered an inner node.
  2. an inner node's count is out of sync with its children array.

Given how few places manipulate leaf, I think the latter option is more likely.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 27, 2020
In an effort to track down the bug that triggered cockroachdb#51913, this commit
ports the randomized interval btree benchmarks to also be unit tests.
This allows us to run invariant checks (see `btree.Verify`) on randomized
tree configurations.

Doing so revealed a violation of the `isUpperBoundCorrect` invariant.
This was determined to be a bug in `node.removeMax`. When removing an
item from a grandchild node, we were failing to adjust the upper bound
of the child node. It doesn't look like this could cause user-visible
effects because the upper bound of a subtree is only ever decreased on
removal, so at worst, this caused searches in the tree to do more work
than strictly necessary. Still, this is a good bug to fix and it's
encouraging that the new randomized testing using the existing invariant
validation caught it.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 3, 2020
In an effort to track down the bug that triggered cockroachdb#51913, this commit
ports the randomized interval btree benchmarks to also be unit tests.
This allows us to run invariant checks (see `btree.Verify`) on randomized
tree configurations.

Doing so revealed a violation of the `isUpperBoundCorrect` invariant.
This was determined to be a bug in `node.removeMax`. When removing an
item from a grandchild node, we were failing to adjust the upper bound
of the child node. It doesn't look like this could cause user-visible
effects because the upper bound of a subtree is only ever decreased on
removal, so at worst, this caused searches in the tree to do more work
than strictly necessary. Still, this is a good bug to fix and it's
encouraging that the new randomized testing using the existing invariant
validation caught it.
craig bot pushed a commit that referenced this issue Aug 3, 2020
51865: sql: use the descs.Collection to access types during distributed flows r=rohany a=rohany

This commit enables distributed queries to access user defined type
metadata during flow setup via the lease manager, so that accesses to
this metadata is cached and doesn't have to go through k/v on every
access.

This is achieved by giving the `FlowContext` a `descs.Collection` is
used to access the descriptors through the lease manager.

Release note: None

51939: interval/generic: improve randomized testing, fix upper bound bug r=nvanbenschoten a=nvanbenschoten

In an effort to track down the bug that triggered #51913, this commit
ports the randomized interval btree benchmarks to also be unit tests.
This allows us to run invariant checks (see `btree.Verify`) on randomized
tree configurations.

Doing so revealed a violation of the `isUpperBoundCorrect` invariant.
This was determined to be a bug in `node.removeMax`. When removing an
item from a grandchild node, we were failing to adjust the upper bound
of the child node. It doesn't look like this could cause user-visible
effects because the upper bound of a subtree is only ever decreased on
removal, so at worst, this caused searches in the tree to do more work
than strictly necessary. Still, this is a good bug to fix and it's
encouraging that the new randomized testing using the existing invariant
validation caught it.

52090: sql: support ALTER TABLE SET SCHEMA command r=rohany a=RichardJCai

sql: support ALTER TABLE SET SCHEMA command

Release note (sql change): Added support for
ALTER TABLE/SEQUENCE/VIEW SET SCHEMA to set the schema of
the table to the target schema.

One must have DROP privilege on the table and CREATE privilege
on the schema to perform the operation.

52230: bulkio: Implement `SHOW SCHEDULES` r=miretskiy a=miretskiy

Informs #51850
Informs #51600

Implement `SHOW SCHEDULES` statemen which displays the information
on scheduled jobs.

Display schedule information, optionally filtered
by schedule state (paused or not) and optionally restricted
just to the backup schedules:

```
SHOW [RUNNING|PAUSED] SCHEDULES [FOR BACKUP]
```

In addition, it is possible to display information
for a specific schedule:

```
SHOW SCHEDULE 123
```

Release Notes (enterprise change): `SHOW SCHEDULES` displays
information about the scheduled jobs.

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
@tbg tbg closed this as completed Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

No branches or pull requests

4 participants