Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/importccl: package timed out under stress #31126

Closed
cockroach-teamcity opened this issue Oct 9, 2018 · 9 comments
Closed

ccl/importccl: package timed out under stress #31126

cockroach-teamcity opened this issue Oct 9, 2018 · 9 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/35b30e8d95cf0fd111fc98326d92ba4339a9945c

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=(unknown) PKG=github.com/cockroachdb/cockroach/pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=953256&tab=buildLog

Slow failing tests:
TestImportMysql/all_from_multi_bzip - 0.04s

Slow passing tests:
TestImportCSVStmt - 186.16s
TestImportData - 94.37s
TestImportLivenessWithLeniency - 47.05s
TestImportLivenessWithRestart - 37.94s
TestImportMVCCChecksums - 4.49s

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 9, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 9, 2018
@tbg tbg assigned maddyblue and unassigned andreimatei Oct 9, 2018
@maddyblue
Copy link
Contributor

I ran a local stress test for 22m with no failures with TESTS=TestImport. Not obvious what's going on.

@maddyblue
Copy link
Contributor

There is a go routine stuck on DROP TABLE IF EXISTS simple, second, third, everything CASCADE in import_stmt_test.go for 32 minutes, so I'm guessing that's the problem. Looking deeper, lease.go is calling sqlbase.GetTableDescFromID and being blocked for 30 minutes with a stack trace ending up in grpc. Unclear to me if grpc has a problem or if it's retrying a call and the remote side has a problem, but doesn't appear to be an import problem.

@maddyblue maddyblue assigned vivekmenezes and unassigned maddyblue Oct 9, 2018
@tbg
Copy link
Member

tbg commented Oct 9, 2018

What exactly did you run @mjibson? Did you run stressrace? One failure mode for these tests is that the test is too heavyweight to run under stressrace, and that things never make any progress.

@maddyblue
Copy link
Contributor

make stress TESTS=TestImport PKG=./pkg/ccl/importccl ran for 22 minutes without failure on my laptop.

@tbg
Copy link
Member

tbg commented Oct 9, 2018

stressrace makes a huge difference for these things (the original test ran under stress). You can roachprod create mjibson-stress -n 5 --gce-machine-type=n1-standard-8 --local-ssd=false and then make roachprod-stress CLUSTER=mjibson-stress PKG=./pkg/ccl/importccl to simulate what runs on TC.

Unfortunately we're not that good yet at telling when the test failure is a result of simply overloading the machine, which I suspect might be the case here.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/310a04983cda8ab8d67cd401814341b9b7f8ce79

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=(unknown) PKG=github.com/cockroachdb/cockroach/pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=973367&tab=buildLog

Slow failing tests:
TestImportMysql/all_from_multi_gzip - 0.04s

Slow passing tests:
TestImportCSVStmt - 316.94s
TestImportData - 91.61s
TestImportLivenessWithLeniency - 44.03s
TestImportLivenessWithRestart - 41.71s
TestImportMVCCChecksums - 3.60s

@vivekmenezes
Copy link
Contributor

goroutine 20070 [select, 7 minutes]:
github.com/cockroachdb/cockroach/vendor/github.com/marusama/semaphore.(*semaphore).Acquire(0xc4212b8990, 0x2fcf340, 0xc423aff740, 0x1, 0x2fcf340, 0xc423aff740)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/marusama/semaphore/semaphore.go:114 +0x189
github.com/cockroachdb/cockroach/pkg/util/limit.(*ConcurrentRequestLimiter).Begin(0xc4206e4400, 0x2fcf340, 0xc423aff740, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/limit/limiter.go:47 +0xfc
github.com/cockroachdb/cockroach/pkg/ccl/storageccl.evalImport(0x2fcf340, 0xc423aff740, 0x30049e0, 0xc42125b500, 0x155ecf651034fd08, 0x0, 0x100000001, 0x1, 0xd1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/storageccl/import.go:239 +0x191
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeAdminBatch(0xc42125b500, 0x2fcf340, 0xc423aff740, 0x155ecf651034fd08, 0x0, 0x100000001, 0x1, 0xd1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2754 +0xbf4
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendWithRangeID(0xc42125b500, 0x2fcf340, 0xc423aff740, 0xd1, 0x155ecf651034fd08, 0x0, 0x100000001, 0x1, 0xd1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2022 +0x671
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).Send(0xc42125b500, 0x2fcf340, 0xc423aff710, 0x155ecf651034fd08, 0x0, 0x100000001, 0x1, 0xd1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:1964 +0x90
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send(0xc4206e4100, 0x2fcf340, 0xc423aff710, 0x155ecf651034fd08, 0x0, 0x100000001, 0x1, 0xd1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3022 +0x60c
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc42227f680, 0x2fcf340, 0xc423aff6b0, 0x0, 0x0, 0x100000001, 0x1, 0xd1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0xdb
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x2fcf340, 0xc423aff6b0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:988 +0x1c1
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc4211aa750, 0x2fcf340, 0xc423aff6b0, 0x2a9bde7, 0x10, 0xc4237f0508, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xed
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc421721c00, 0x2fcf340, 0xc423aff6b0, 0xc424835200, 0xc423aff6b0, 0x290e760, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:975 +0x165
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc421721c00, 0x2fcf340, 0xc423aff6b0, 0xc424835200, 0x911c99, 0x2bcf4e8, 0xc420fff740)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1016 +0x9c
github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.Batch(0x2faa5c0, 0xc421721c00, 0x2fcf340, 0xc423aff650, 0xc424835200, 0x0, 0x0, 0x0, 0xc420babb40, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:431 +0x4f
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).sendBatch(0xc423aff620, 0x2fcf340, 0xc423aff650, 0x2faf8c0, 0xc420450ad0, 0x0, 0x0, 0x100000001, 0x1, 0xd1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:199 +0x138
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext(0xc423aff620, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x100000001, 0x1, 0xd1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:169 +0x138
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas(0xc421676e00, 0x2fcf280, 0xc420babb40, 0xc421676e50, 0xd1, 0xc423f1ff90, 0x3, 0x3, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1325 +0x30a
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC(0xc421676e00, 0x2fcf280, 0xc420babb40, 0xd1, 0xc423f1ff90, 0x3, 0x3, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:392 +0x27c
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange(0xc421676e00, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:470 +0x231
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc421676e00, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1101 +0x322
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc421676e00, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:772 +0x1364
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc421676e00, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:684 +0x4e3
github.com/cockroachdb/cockroach/pkg/internal/client.(*CrossRangeTxnWrapperSender).Send(0xc42017ece0, 0x2fcf280, 0xc420babb40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:213 +0xc1
github.com/cockroachdb/cockroach/pkg/internal/client.SendWrappedWith(0x2fcf280, 0xc420babb40, 0x2f9f520, 0xc42017ece0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/sender.go:436 +0x13f
github.com/cockroachdb/cockroach/pkg/internal/client.SendWrapped(0x2fcf280, 0xc420babb40, 0x2f9f520, 0xc42017ece0, 0x2ff46a0, 0xc424b11290, 0x2ff82a0, 0xc420499d40, 0x20)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/sender.go:453 +0xa6
github.com/cockroachdb/cockroach/pkg/ccl/backupccl.restore.func4(0x2fcf280, 0xc420babb40, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/restore.go:1136 +0x1bc
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1(0x8, 0x2bcf508)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:184 +0x3a
github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc420babb80, 0xc423aff140)
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
created by github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66

@mjibson I took a look at https://teamcity.cockroachdb.com/viewLog.html?buildId=974531&tab=buildResultsDiv&buildTypeId=Cockroach_UnitTests_Test
and it does appear that there are a couple of goroutines stuck in Acquiring a semaphore. One example above.

@maddyblue
Copy link
Contributor

I'm guessing this is #31689. Closing on that assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

5 participants