Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libroach: try to ingest SSTs without write-stalls #34800

Merged
merged 1 commit into from
Feb 25, 2019

Conversation

dt
Copy link
Member

@dt dt commented Feb 11, 2019

This attempts to ingest without allowing a memtable flush that could
cause write-stalls. If that fails, it then does its own, no-stall flush
and waits for it before retrying the ingest. On the re-attempt, the
ingest is allowed to do a blocking flush if it needs to, but the hope is
that the explicit flush means it will not have to.

Release note (perforrmance improvement): reduce impact of bulk data ingestion on foreground traffic with by controlling RocksDB flushes.

Release note: None

@dt dt requested review from nvanbenschoten, petermattis and a team February 11, 2019 21:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt
Copy link
Member Author

dt commented Feb 11, 2019

I haven't tested this yet on a loaded roachprod cluster but I wanted to see what #34212 thinks of it.

@vivekmenezes
Copy link
Contributor

saw this

F190212 01:23:14.777736 149 storage/replica_proposal.go:461  [n1,s1,r21/1:/{Table/53/1/3…-Max}] while ingesting c5c4654c: Invalid argument: External file requires flush
goroutine 149 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x0, 0xc00005db60, 0x8216101, 0x1b)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x85b47c0, 0xc000000004, 0x82161f9, 0x1b, 0x1cd, 0xc00162cfc0, 0x6d)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x951
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6df3c00, 0xc00416fe90, 0x4, 0x2, 0x66a5e54, 0x16, 0xc0044c0990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6df3c00, 0xc00416fe90, 0x1, 0xc000000004, 0x66a5e54, 0x16, 0xc0044c0990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x6df3c00, 0xc00416fe90, 0x66a5e54, 0x16, 0xc0044c0990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:182 +0x7e
github.com/cockroachdb/cockroach/pkg/storage.addSSTablePreApply(0x6df3c00, 0xc00416fe90, 0xc0004bb300, 0x6e383a0, 0xc0003c9080, 0x6e114a0, 0xc0031e9e80, 0x7, 0xfd, 0xc002007c00, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:461 +0x39f
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).processRaftCommand(0xc0008b9b00, 0x6df3c00, 0xc00416fe90, 0xc0002cf1f0, 0x8, 0x7, 0xfd, 0x500000005, 0x2, 0x2, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1864 +0x1569
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc0008b9b00, 0x6df3c00, 0xc00416fe90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:785 +0x1265
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x6df3c00, 0xc00416fe90, 0xc0008b9b00, 0x6df3c00)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3806 +0x12c
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000b3a600, 0x6df3c00, 0xc00416fe90, 0xc0013a5a40, 0xc0044c1ed0, 0x0)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3453 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc000b3a600, 0x6df3c00, 0xc0005b16b0, 0x15)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3794 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc000525500, 0x6df3c00, 0xc0005b16b0)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6df3c00, 0xc0005b16b0)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc000266420, 0xc00047a5a0, 0xc000266410)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198 +0xe1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:191 +0xa8

@dt
Copy link
Member Author

dt commented Feb 12, 2019

whoops, yep, accidentally got the bool check backwards -- fixed.

@vivekmenezes
Copy link
Contributor

F190212 12:55:32.697900 1145 storage/replica_proposal.go:461  [n5,s5,r21/2:/{Table/53/1/3…-Max}] while ingesting 93c1d382: Invalid argument: External file requires flush
goroutine 1145 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000320f00, 0xc000320f60, 0x8216100, 0x1b)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x85b47c0, 0xc000000004, 0x82161f9, 0x1b, 0x1cd, 0xc00434cc40, 0x6d)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x951
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6df3c00, 0xc0063fa570, 0x4, 0x2, 0x66a5e54, 0x16, 0xc004156990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6df3c00, 0xc0063fa570, 0x1, 0xc000000004, 0x66a5e54, 0x16, 0xc004156990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x6df3c00, 0xc0063fa570, 0x66a5e54, 0x16, 0xc004156990, 0x2, 0x2)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:182 +0x7e
github.com/cockroachdb/cockroach/pkg/storage.addSSTablePreApply(0x6df3c00, 0xc0063fa570, 0xc002f14000, 0x6e383a0, 0xc003152300, 0x6e114a0, 0xc0046405c0, 0x7, 0xf2, 0xc003296e00, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:461 +0x39f
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).processRaftCommand(0xc004a8cd80, 0x6df3c00, 0xc0063fa570, 0xc004262598, 0x8, 0x7, 0xf2, 0x500000005, 0x2, 0x2, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1864 +0x1569
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc004a8cd80, 0x6df3c00, 0xc0041c4480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:785 +0x1265
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x6df3c00, 0xc0041c4480, 0xc004a8cd80, 0x6df3c00)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3806 +0x12c
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000b16000, 0x6df3c00, 0xc0041c4480, 0xc003d0afb0, 0xc004157ed0, 0x0)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3453 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc000b16000, 0x6df3c00, 0xc0013c8690, 0x15)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3794 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc001075700, 0x6df3c00, 0xc0013c8690)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6df3c00, 0xc0013c8690)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0016b5a20, 0xc00129db90, 0xc0016b5a10)
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198 +0xe1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/Users/vivek/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:191 +0xa8



same issue. Looks like you cannot set the boolean to true

@vivekmenezes
Copy link
Contributor

ah ha I see you define another status var here. Modified locally and running

@vivekmenezes
Copy link
Contributor

okay this works . I do see another error:

 distsql_physical_planner_test.go:87: target store 4 not yet in range descriptor r22:/Table/53/1/{2400-3200} [(n1,s1):1, next=2, gen=0]

I'm pretty sure that's not related.

Is there a concern with a race in which data enters the memtable between the flush and the second ingest?

@vivekmenezes
Copy link
Contributor

I ran it for another 10 minutes without hitting any problems.

@dt
Copy link
Member Author

dt commented Feb 12, 2019

whoops, yeah, fixed the shadowed status too now.

@dt
Copy link
Member Author

dt commented Feb 12, 2019

re:race. If a write makes it in that would require another flush, on the retry we do allow a write_stall so it should just go ahead and get it done that time. Hopefully a) that is unlikely to happen at all and b) if it does happen, since we just did a flush already, this one should be a very small, very quick, blip at worst.


// It is possible we failed because the memtable wanted to flush but we did
// not allow a blocking flush on the first try. Do a manual, non-blocking
// flush and wait for it, then try again.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing to say that the flush is non-blocking and that we'll wait for it. Perhaps something like: Perform a manual flush which will not stall other writes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-worded.

c-deps/libroach/db.cc Show resolved Hide resolved
c-deps/libroach/db.cc Show resolved Hide resolved
This attempts to ingest without allowing a memtable flush that could
cause write-stalls. If that fails, it then does its own, no-stall flush
and waits for it before retrying the ingest. On the re-attempt, the
ingest is allowed to do a blocking flush if it needs to, but the hope is
that the explicit flush means it will not have to.

Release note (perforrmance improvement): reduce impact of bulk data ingestion on foreground traffic with by controlling RocksDB flushes.

Release note: None
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dt
Copy link
Member Author

dt commented Feb 25, 2019

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2019

👎 Rejected by PR status

@dt
Copy link
Member Author

dt commented Feb 25, 2019

manually setting cla status

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Feb 25, 2019

Build failed (retrying...)

craig bot pushed a commit that referenced this pull request Feb 25, 2019
34800: libroach: try to ingest SSTs without write-stalls r=dt a=dt

This attempts to ingest without allowing a memtable flush that could
cause write-stalls. If that fails, it then does its own, no-stall flush
and waits for it before retrying the ingest. On the re-attempt, the
ingest is allowed to do a blocking flush if it needs to, but the hope is
that the explicit flush means it will not have to.

Release note (perforrmance improvement): reduce impact of bulk data ingestion on foreground traffic with by controlling RocksDB flushes.

Release note: None

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
@craig
Copy link
Contributor

craig bot commented Feb 25, 2019

Build succeeded

@craig craig bot merged commit dfd9725 into cockroachdb:master Feb 25, 2019
@dt dt deleted the ingest-flush branch February 25, 2019 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants