Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: TestRegistryLifecycle failed #68315

Closed
cockroach-teamcity opened this issue Aug 2, 2021 · 8 comments · Fixed by #68374
Closed

jobs: TestRegistryLifecycle failed #68315

cockroach-teamcity opened this issue Aug 2, 2021 · 8 comments · Fixed by #68374
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery

Comments

@cockroach-teamcity
Copy link
Member

jobs.TestRegistryLifecycle failed with artifacts on master @ c995342ead51e08f8ed1155de4218d30a00d86d2:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x516ba00]

Stack:

goroutine 74085 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0021d6120, 0x8ea0ad0, 0xc0042069c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x15d
panic(0x5b5e540, 0xad5a700)
	/usr/local/go/src/runtime/panic.go:971 +0x499
github.com/cockroachdb/cockroach/pkg/util/tracing/collector.(*Iterator).Valid(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/collector/collector.go:107
github.com/cockroachdb/cockroach/pkg/sql.glob..func75(0x8ea0ad0, 0xc004206540, 0x8fd05e8, 0xc0042f4568, 0xc00109e598, 0x0, 0x0, 0xc0019bab00, 0xc0045d0960, 0xc0020d6aa0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/crdb_internal.go:1228 +0x1e0
github.com/cockroachdb/cockroach/pkg/sql.(*virtualDefEntry).makeConstrainedRowsGenerator.func1(0x8e28660, 0xc00320f3e0, 0xc001902d38, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_schema.go:632 +0x5ae
github.com/cockroachdb/cockroach/pkg/sql.setupGenerator.func3(0x8ea0ad0, 0xc0042069c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:120 +0x1aa
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc0021d6120, 0x8ea0ad0, 0xc0042069c0, 0xc0027f4000, 0x0, 0xc002319e30)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0x162
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x28e
Log preceding fatal error

=== RUN   TestRegistryLifecycle/dump_traces_on_cancel
    jobs_test.go:250: Starting resume
    jobs_test.go:263: Exiting resume
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:299: Exiting OnFailOrCancel
E210802 06:30:21.191981 74082 jobs/adopt.go:327  [n1] 57  job 680969571288383489: adoption completed with error job canceled
E210802 06:30:21.200704 72865 jobs/registry.go:644  [-] 58  error claiming jobs: could not query jobs table: claim-jobs: context canceled

Reproduce

To reproduce, try:

make stressrace TESTS=TestRegistryLifecycle PKG=./pkg/jobs TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-race -parallel=4

/cc @cockroachdb/sql-schema spaskob

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 2, 2021
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Aug 2, 2021
@cockroach-teamcity
Copy link
Member Author

jobs.TestRegistryLifecycle failed with artifacts on master @ c995342ead51e08f8ed1155de4218d30a00d86d2:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x34a6dad]

Stack:

goroutine 68754 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc00692ac00, 0x55dded0, 0xc009a95040)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x126
panic(0x3c94de0, 0x729c6a0)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/cockroachdb/cockroach/pkg/util/tracing/collector.(*Iterator).Valid(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/collector/collector.go:107
github.com/cockroachdb/cockroach/pkg/sql.glob..func75(0x55dded0, 0xc009a94d40, 0x570c878, 0xc0068917c8, 0xc006637590, 0x0, 0x0, 0xc007b73d00, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/crdb_internal.go:1228 +0x12d
github.com/cockroachdb/cockroach/pkg/sql.(*virtualDefEntry).makeConstrainedRowsGenerator.func1(0x5566780, 0xc00572db00, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_schema.go:632 +0x4a2
github.com/cockroachdb/cockroach/pkg/sql.setupGenerator.func3(0x55dded0, 0xc009a95040)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:120 +0x154
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc00692ac00, 0x55dded0, 0xc009a95040, 0xc004908180, 0x0, 0xc00950aa20)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0xf3
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x22b
Log preceding fatal error

=== RUN   TestRegistryLifecycle/dump_traces_on_cancel
    jobs_test.go:250: Starting resume
    jobs_test.go:263: Exiting resume
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:299: Exiting OnFailOrCancel
E210802 06:31:26.817377 68724 jobs/adopt.go:327  [n1] 38  job 680969787040137217: adoption completed with error job canceled

Reproduce

To reproduce, try:

make stressrace TESTS=TestRegistryLifecycle PKG=./pkg/jobs TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-parallel=4

/cc @cockroachdb/sql-schema spaskob

This test on roachdash | Improve this report!

@ajwerner
Copy link
Contributor

ajwerner commented Aug 2, 2021

@adityamaru

@cockroach-teamcity
Copy link
Member Author

jobs.TestRegistryLifecycle failed with artifacts on master @ 65d81716a22a38873a679b4fd5f1e019280725d5:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x34ab76d]

Stack:

goroutine 68614 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc005ee4180, 0x55e4770, 0xc001da9b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x126
panic(0x3c9a120, 0x72a66a0)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/cockroachdb/cockroach/pkg/util/tracing/collector.(*Iterator).Valid(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/collector/collector.go:107
github.com/cockroachdb/cockroach/pkg/sql.glob..func75(0x55e4770, 0xc001da9900, 0x5713118, 0xc005b3d4b8, 0xc002eeb590, 0x0, 0x0, 0xc005fe7180, 0x203001, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/crdb_internal.go:1228 +0x12d
github.com/cockroachdb/cockroach/pkg/sql.(*virtualDefEntry).makeConstrainedRowsGenerator.func1(0x556cf60, 0xc003a74800, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_schema.go:632 +0x4a2
github.com/cockroachdb/cockroach/pkg/sql.setupGenerator.func3(0x55e4770, 0xc001da9b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:120 +0x154
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc005ee4180, 0x55e4770, 0xc001da9b00, 0xc00235d440, 0x0, 0xc0042339b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0xf3
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x22b
Log preceding fatal error

=== RUN   TestRegistryLifecycle/dump_traces_on_cancel
    jobs_test.go:250: Starting resume
    jobs_test.go:263: Exiting resume
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:299: Exiting OnFailOrCancel
E210803 06:38:02.376063 68589 jobs/adopt.go:327  [n1] 39  job 681254198670229505: adoption completed with error job canceled
E210803 06:38:02.378846 67538 jobs/registry.go:644  [-] 40  error claiming jobs: could not query jobs table: claim-jobs: context canceled

Reproduce

To reproduce, try:

make stressrace TESTS=TestRegistryLifecycle PKG=./pkg/jobs TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-parallel=4

/cc @cockroachdb/sql-schema spaskob

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

jobs.TestRegistryLifecycle failed with artifacts on master @ 65d81716a22a38873a679b4fd5f1e019280725d5:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5172c00]

Stack:

goroutine 74716 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc002ee0ab0, 0x8eac710, 0xc002a57580)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x15d
panic(0x5b668a0, 0xad6a700)
	/usr/local/go/src/runtime/panic.go:971 +0x499
github.com/cockroachdb/cockroach/pkg/util/tracing/collector.(*Iterator).Valid(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/collector/collector.go:107
github.com/cockroachdb/cockroach/pkg/sql.glob..func75(0x8eac710, 0xc002a572c0, 0x8fdc228, 0xc0027988e8, 0xc002cf9598, 0x0, 0x0, 0xc0037c1180, 0x0, 0xc0033eeb40, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/crdb_internal.go:1228 +0x1e0
github.com/cockroachdb/cockroach/pkg/sql.(*virtualDefEntry).makeConstrainedRowsGenerator.func1(0x8e341e0, 0xc00352a000, 0xc000660d38, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_schema.go:632 +0x5ae
github.com/cockroachdb/cockroach/pkg/sql.setupGenerator.func3(0x8eac710, 0xc002a57580)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:120 +0x1aa
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc002ee0ab0, 0x8eac710, 0xc002a57580, 0xc0040313c0, 0x0, 0xc00275be60)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0x162
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x28e
Log preceding fatal error

=== RUN   TestRegistryLifecycle/dump_traces_on_cancel
    jobs_test.go:250: Starting resume
    jobs_test.go:263: Exiting resume
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:299: Exiting OnFailOrCancel
E210803 06:43:23.584904 74593 jobs/adopt.go:327  [n1] 74  job 681255250468995073: adoption completed with error job canceled

Reproduce

To reproduce, try:

make stressrace TESTS=TestRegistryLifecycle PKG=./pkg/jobs TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-race -parallel=4

/cc @cockroachdb/sql-schema spaskob

This test on roachdash | Improve this report!

@ajwerner
Copy link
Contributor

ajwerner commented Aug 3, 2021

This is #67386 I'm pretty certain.

@adityamaru
Copy link
Contributor

Yep, I have a fix in the works. Just trying to get the test passing under stress.

adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 3, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None
@cockroach-teamcity
Copy link
Member Author

jobs.TestRegistryLifecycle failed with artifacts on master @ eef03a46f2e43ff70485dadf7d9ad445db05cab4:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x34b458d]

Stack:

goroutine 68957 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc00833de00, 0x5601090, 0xc00bf1d240)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x126
panic(0x3caaec0, 0x72cc6a0)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/cockroachdb/cockroach/pkg/util/tracing/collector.(*Iterator).Valid(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/collector/collector.go:107
github.com/cockroachdb/cockroach/pkg/sql.glob..func75(0x5601090, 0xc00bf1d040, 0x572fbb8, 0xc00a2692f8, 0xc005dd7590, 0x0, 0x0, 0xc009a5bf80, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/crdb_internal.go:1228 +0x12d
github.com/cockroachdb/cockroach/pkg/sql.(*virtualDefEntry).makeConstrainedRowsGenerator.func1(0x5589500, 0xc003a6b080, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_schema.go:632 +0x4a2
github.com/cockroachdb/cockroach/pkg/sql.setupGenerator.func3(0x5601090, 0xc00bf1d240)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:120 +0x154
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc00833de00, 0x5601090, 0xc00bf1d240, 0xc0033ec6c0, 0x0, 0xc00ab4b830)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0xf3
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x22b
Log preceding fatal error

=== RUN   TestRegistryLifecycle/dump_traces_on_cancel
    jobs_test.go:250: Starting resume
    jobs_test.go:263: Exiting resume
    jobs_test.go:284: Starting FailOrCancel
    jobs_test.go:299: Exiting OnFailOrCancel
E210804 06:33:34.993109 68972 jobs/adopt.go:327  [n1] 50  job 681536438033186817: adoption completed with error job canceled

Reproduce

To reproduce, try:

make stressrace TESTS=TestRegistryLifecycle PKG=./pkg/jobs TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-parallel=4

/cc @cockroachdb/sql-schema spaskob

This test on roachdash | Improve this report!

jbowens added a commit to jbowens/cockroach that referenced this issue Aug 4, 2021
Refs: cockroachdb#68315

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
craig bot pushed a commit that referenced this issue Aug 4, 2021
66362: kv,storage: use a BytesMonitor to track memory allocations for scans r=sumeerbhola a=sumeerbhola

The goal is to track memory allocations for:
- non-local SQL=>KV requests: this can happen with joins,
  multi-tenant clusters, and if ranges move between DistSQL
  planning and execution.
- local SQL=>KV requests for the first request by a fetcher:
  in this case the fetcher reserves a modest 1KB which can
  be significantly exceeded by KV allocations.

Only allocations in pebbleMVCCScanner for kv pairs and intents are
tracked. The memory is released when returning from
executeReadOnlyBatchWithServersideRefreshes since the chain
of returns will end up in gRPC response processing and we can't
hook into where that memory is released. This should still help
for some cases of OOMs, and give some signal of memory overload
that we can use elsewhere (e.g. admission control).

The BytesMonitor is used to construct a BoundAccount that is
wrapped in a narrower ScannerMemoryMonitor that is passed via the
EvalContext interface. The other alternative would be for the
engine to have a BytesMonitor at initialization time that it
can use to construct a BoundAccount for each MVCC scan, and
pass it back via MVCCScanResult. This would mean multiple
BoundAccounts for a batch (since we don't want to release memory
until all the requests in the batch are processed), and would
be harder to extend to track additional request types compared
to embedding in EvalContext.

The rootSQLMonitor is reused for this memory allocation tracking.
This tracking is always done for non-local requests, and for the
first request by a fetcher for a local request. This is to
avoid double-counting, the first request issued by a SQL fetcher
only reserves 1KB, but subsequent ones have already reserved
what was returned in the first response. So there is room to
tighten this if we knew what had been reserved on the local
client (there are complications because the batch may have
been split to send to different nodes, only one of which was
local).

The AdmissionHeader.SourceLocation field is used to mark local
requests and is set in rpc.internalClientAdapter. The first
request is marked using the
AdmissionHeader.NoMemoryReservedAtSource bit.

Informs #19721

Release note (ops change): The memory pool used for SQL is now
also used to cover KV memory used for scans.

67451: colexec: optimize Bytes sorting with abbreviated values r=mgartner a=mgartner

#### colexec: add benchmark for sorting UUIDs

Release note: None

#### colexec: optimize Bytes sorting with abbreviated values

This commit introduces an optimization for sorting `Bytes` vectors.
While sorting, a signification potion of time is spent performing
comparisons between two datums. These comparisons are optimized by
creating an abbreviated `uint64` value for each `[]byte` that represents
up to the first eight bytes in the slice. The abbreviated value is used
for a comparison fast path that avoid comparing all bytes in a slice in
some cases. This technique is used by Postgres[1][2] and Pebble[3].

Abbreviating values to a `uint64` helps optimize comparisons for datums
that are larger than the size of the system's native pointer (64 bits).
Given datums `a` and `b`, the following properties must hold true for
the respective abbreviated values `abbrA` and `abbrB`:

  - `abbrA > abbrB` iff `a > b`
  - `abbrA < abbrB` iff `a < b`
  - If `abbrA == abbrB`, it is unknown if `a` is greater than, less
    than, or equal to `b`. A full comparison of `a` and `b` is required.

Comparing the abbreviated values first improves comparison performance
for two reasons. First, comparing two `uint64`s is fast. It is a single
CPU instruction. Any datum larger than 64 bits requires multiple
instructions for comparison. For example, comparing two `[]byte`s
requires iterating over each byte until non-equal bytes are found.
Second, CPU caches are more efficiently used because the abbreviated
values of a vector are packed into a `[]uint64` of contiguous memory.
This increases the chance that two datums can be compared by only
fetching information from CPU caches rather than main memory.

In the benchmarks below, `rows` is the number of rows being sorted,
`cols` is the number of sort columns, and `constAbbrPct` is the
percentage of rows for which the abbreviated values are the same. The
benchmarks show that in the best case, when the abbreviated values are
all unique, sort throughput has increased by up to ~45%. In the worst
case, where all values share the same abbreviated value, throughput has
decreased by as much as ~8.5%. This is due to the extra overhead of
creating abbreviated values and comparing them, only to have to fall
back to a full comparison every time.

    name                                             old speed      new speed      delta
    SortUUID/rows=2048/cols=1/constAbbrPct=0-16      38.0MB/s ± 2%  55.3MB/s ± 3%  +45.30%
    SortUUID/rows=2048/cols=1/constAbbrPct=50-16     36.2MB/s ± 3%  43.0MB/s ± 4%  +18.82%
    SortUUID/rows=2048/cols=1/constAbbrPct=75-16     36.0MB/s ± 4%  37.1MB/s ± 4%     ~
    SortUUID/rows=2048/cols=1/constAbbrPct=90-16     36.8MB/s ± 1%  36.2MB/s ± 3%     ~
    SortUUID/rows=2048/cols=1/constAbbrPct=100-16    37.0MB/s ± 1%  33.8MB/s ± 5%   -8.66%
    SortUUID/rows=2048/cols=2/constAbbrPct=0-16      60.3MB/s ± 1%  74.0MB/s ± 3%  +22.85%
    SortUUID/rows=2048/cols=2/constAbbrPct=50-16     58.7MB/s ± 1%  63.7MB/s ± 1%   +8.54%
    SortUUID/rows=2048/cols=2/constAbbrPct=75-16     58.3MB/s ± 1%  56.6MB/s ± 6%     ~
    SortUUID/rows=2048/cols=2/constAbbrPct=90-16     58.7MB/s ± 1%  54.2MB/s ± 3%   -7.69%
    SortUUID/rows=2048/cols=2/constAbbrPct=100-16    59.3MB/s ± 1%  56.1MB/s ± 4%   -5.30%
    SortUUID/rows=16384/cols=1/constAbbrPct=0-16     30.7MB/s ± 2%  41.9MB/s ± 3%  +36.24%
    SortUUID/rows=16384/cols=1/constAbbrPct=50-16    30.6MB/s ± 1%  32.0MB/s ± 8%     ~
    SortUUID/rows=16384/cols=1/constAbbrPct=75-16    30.2MB/s ± 2%  30.9MB/s ± 6%     ~
    SortUUID/rows=16384/cols=1/constAbbrPct=90-16    30.4MB/s ± 2%  27.6MB/s ± 3%   -9.29%
    SortUUID/rows=16384/cols=1/constAbbrPct=100-16   30.6MB/s ± 2%  28.5MB/s ± 5%   -6.98%
    SortUUID/rows=16384/cols=2/constAbbrPct=0-16     49.5MB/s ± 2%  59.0MB/s ± 2%  +19.13%
    SortUUID/rows=16384/cols=2/constAbbrPct=50-16    48.6MB/s ± 1%  49.5MB/s ± 3%   +1.84%
    SortUUID/rows=16384/cols=2/constAbbrPct=75-16    47.3MB/s ± 2%  47.3MB/s ± 2%     ~
    SortUUID/rows=16384/cols=2/constAbbrPct=90-16    48.5MB/s ± 2%  45.2MB/s ± 1%   -6.65%
    SortUUID/rows=16384/cols=2/constAbbrPct=100-16   48.9MB/s ± 1%  44.4MB/s ± 2%   -9.13%
    SortUUID/rows=262144/cols=1/constAbbrPct=0-16    25.4MB/s ± 2%  35.7MB/s ± 3%  +40.53%
    SortUUID/rows=262144/cols=1/constAbbrPct=50-16   24.9MB/s ± 3%  28.8MB/s ± 3%  +15.44%
    SortUUID/rows=262144/cols=1/constAbbrPct=75-16   25.4MB/s ± 2%  25.7MB/s ± 5%     ~
    SortUUID/rows=262144/cols=1/constAbbrPct=90-16   25.4MB/s ± 3%  24.7MB/s ± 2%   -3.00%
    SortUUID/rows=262144/cols=1/constAbbrPct=100-16  25.1MB/s ± 3%  23.7MB/s ± 2%   -5.73%
    SortUUID/rows=262144/cols=2/constAbbrPct=0-16    37.5MB/s ± 2%  43.7MB/s ± 5%  +16.65%
    SortUUID/rows=262144/cols=2/constAbbrPct=50-16   37.5MB/s ± 1%  37.8MB/s ± 7%     ~
    SortUUID/rows=262144/cols=2/constAbbrPct=75-16   37.1MB/s ± 4%  36.4MB/s ± 1%     ~
    SortUUID/rows=262144/cols=2/constAbbrPct=90-16   36.9MB/s ± 5%  34.6MB/s ± 7%   -6.26%
    SortUUID/rows=262144/cols=2/constAbbrPct=100-16  37.2MB/s ± 3%  34.0MB/s ± 2%   -8.53%

I experimented with several mitigations to the regressions, but did not
have much luck. Postgres uses HyperLogLog while building abbreviated
values to track their cardinality and abort the optimization if the
cardinality is too low. I attempted this, but the overhead of
HyperLogLog negated the benefits of the optimization entirely.

Using a `map[uint64]struct{}` to track the cardinality of abbreviated
values was the most promising. It was able to reduce the worst
regressions at the expense of reducing the speedup of the best case
benchmarks.

These mechanisms for aborting the optimization when abbreviated value
cardinality is low are essentially a trade-off between maximizing sort
performance when their cardinality is high and minimizing the overhead
of generating and comparing abbreviated values when their cardinality is
low. It's worth future investigation to try to find a low-overhead
method of aborting, and to be more thoughtful about the trade-offs.

[1] https://brandur.org/sortsupport
[2] http://pgeoghegan.blogspot.com/2015/01/abbreviated-keys-exploiting-locality-to.html
[3] https://github.com/cockroachdb/pebble/blob/ea60b4722cca21fa0d3c2749c788e1ac76981f7d/internal/base/comparer.go#L28-L36

Release note (performance improvement): Sort performance has been
improved when sorting columns of type STRING, BYTES, or UUID.


67606: sql: add sql shell telemetry counter r=rafiss,knz,arulajmani a=e-mbrown

Fixes #62208

This change allows us to keep track of
connections made using our internal SQl shell.

Release note: None

68427: jobs: skip TestRegistryLifecycle r=adityamaru a=jbowens

Refs: #68315

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: e-mbrown <ebsonari@gmail.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
@mwang1026 mwang1026 removed the T-sql-schema-deprecated Use T-sql-foundations instead label Aug 9, 2021
@ajwerner
Copy link
Contributor

@adityamaru it's pretty bad that this test remains skipped. Can you please re-enable it or let me know if you're not going to.

adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 27, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality
ajwerner pushed a commit to adityamaru/cockroach that referenced this issue Aug 27, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality
ajwerner pushed a commit to adityamaru/cockroach that referenced this issue Aug 27, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 27, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 27, 2021
Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly undeterministic in
terms of how many times execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: cockroachdb#68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality
craig bot pushed a commit that referenced this issue Aug 29, 2021
68374: jobs: fix race which permitted concurrent execution r=pbardea,ajwerner a=adityamaru

The race was that we'd unregister a job from the `adoptedJobs` map while
processing the pause or cancel requests. This is problematic because the
job also unregisters itself in such cases. However, because the job was
already unregistered, the entry which the now canceled job removes may in
fact correspond to a new execution. We remove this hazard by centralizing
the location where the entry is actually removed. Instead, we have the
cancelation/pause processing only cancel the context of the relevant
execution.

Fixes #68593

Release note: None

Release justification: bug fixes and low-risk updates to new functionality

Previously, TraceCollector.StartIter would return a nil object
if we failed to resolve nodeliveness during initialization. This
led to a NPE.

This change now return a TraceCollector instance with the error field
set to the appropriate error, so that the validity check on the iterator
can correctly handle this scenario.

This change also reworks the dump trace on job cancellation test.
Job cancellation semantics under stress are slightly indeterministic in
terms of how many times the execution of OnFailOrCancel is resumed. This makes
it hard to coordinate when to check and how many trace files to expect.

Fixes: #68315

Release note: None

Release justification: low risk, high benefit changes to existing functionality

Co-authored-by: Andrew Werner <awerner32@gmail.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
@craig craig bot closed this as completed in 9b0e2ef Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants