-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup: OOM during TPC-H scalefactor=10 restore running rc2 #15681
Comments
Arjun mentioned offline that the memprofiles and logs from right after the crash are on navy 1 in ~/restore.rc2.oom.logs.tgz. Looking at
Here's the call stack for that Unmarshal (is there an easier way to get this in pprof?) cc @petermattis |
I wonder if it has something to do with Also cc @a-robinson |
It's possible that we could work around this for 1.0 by reducing the |
We cache entries returned from |
I'm not sure if there is a memory leak at all. This is the Go alloc value from the last 20
|
10 seconds later,
|
Interesting. So it's not leaking, just peaking too high? |
Something like that. I'm very surprised the GC isn't doing a better job here. I think we should save away the logs and profiles and then try another restore on |
sgtm. @justinj mentioned that he wants some production experience, so I'll let him run it (and handhold as necessary) |
This is running now |
There doesn't seem to be a large benefit to running these WriteBatch requests in parallel, since they just end up contending for disk. May as well rate limit them to smooth out the disk usage a bit. Maybe this will help with cockroachdb#15681? Dunno. name old time/op new time/op delta ClusterRestore-8 6.46µs ±10% 7.53µs ± 5% +16.49% (p=0.000 n=11+5) name old speed new speed delta ClusterRestore-8 19.1MB/s ± 9% 16.4MB/s ± 5% -14.32% (p=0.000 n=11+5)
Forgot to update this yesterday. The issue reproduced readily when running a RESTORE while DROPing a large database. Running Also noticed that node 3 (the one that OOM'd) has /mnt mounted as fuseblk which seems like it could explain some stuff. @a-robinson is looking at the logs to see if this is the same issue as #15702. Assigning to him while that happens. In the meantime, we should consider documenting as a known limitation that running RESTORE and other exceptionally disk-heavy commands (DROP) should avoided if possible. Possibly also we should document that having one node with a much slower disk is a bad idea. |
@danhhz Did you see this assertion failure on
|
Both
|
Ah yes. Forgot to include that in my notes. navy 5 hung yesterday (I couldn't even ssh in) and it had to be restarted in the azure portal. So anything wonky with it could be related to that |
I wonder if we're hitting some sort of election death spiral. If reading the unapplied tail of the log takes close to or longer than the Raft election timeout, we could get into a situation where we're constantly calling elections, becoming the leader but then never maintaining the leadership because another follower times out, calls an election and tries to become leader itself. |
Yeah, that's exactly what I was calling out in #15702 (comment) |
Ah, should have read the closer. At least we're independently thinking along the same lines. |
@a-robinson is there anything to be done here for 1.1? This was the same as #15702 which has since been fixed (though also not closed), right? |
Fine by me! duplicate of #15702 |
Cluster
navy
was running a TPC-H scalefactor=10 restore job on a cluster that was rolling upgraded to rc2. There was an existing TPC-H scalefactor=1 database onnavy
, so I ran the following commands:DROP DATABASE tpch;
(ctrl-c-ed out)CREATE DATABASE tpch;
RESTORE tpch.* FROM azure://...
(ctrl-c-ed out)Everything was proceeding fine for a couple hours, when
navy-0001
OOMed. The restore continued after the OOM on node 1, until it got to 0.998769998550415, at which point progress stopped increasing.The logs on navy-0001 have been preserved on the machine, and it has not been restarted in order to facilitate debugging.
The text was updated successfully, but these errors were encountered: