-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CockroachDB gets locked up #47678
Comments
This comment has been minimized.
This comment has been minimized.
Hi @bra-fsn, thanks for the report. It's suspicious that the HTTP endpoints don't even respond. If you stop the test load, does it recover? Assuming it doesn't, can you let it sit for a while (without the load) and then |
Sorry, I've already stopped the server. I'll try to reproduce this again. |
I could reproduce this second time flawlessly, so I hope I can help with data to debug this. |
when you send SIGQUIT, this would produce a goroutine dump also in the log file. Can you check the last log file in the log directory? |
New entries from the log stop after the process starts eating CPU on two threads.
|
I'm a bit stumped by the issue you're encountering.
I am very suspicious of this bug in the go runtime or maybe this one, both were fixed in 1.14.2. FWIW, we only support 1.13 to build cockroachdb 20.1 (although I personally use 1.14.2 and it seems fine). Could you maybe retry building with go 1.14.2 and report your findings? A side note (unrelated to your issue): you can make CockroachDB able to collect system stats and avoid the log spam by mounting /compat/linux/proc, e.g. by adding this to your /etc/fstab:
|
Good catch! At least I could run the test load for 10 minutes, while before it died after 1 or 2. I'll let it run for a day, but I think you've found the root cause. I'm running a native FreeBSD binary, so having |
I wrote the freebsd support in gosigar. It's silly and requires both freebsd /proc and also Linux /compat/linux/proc to be mounted side by side. Obviously this should be better but I didn't have time to make it more elegant.
…--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
That's also why I knew what you needed based on your logs. If you had been running a Linux binary I wouldn't have needed to tell you about proc because nearly all Linux programs need it. |
Oh, indeed! I didn't know that! I've tried to mount Thanks a lot! |
It runs fine since then, so closing the issue. Thanks @knz! |
👍 Happy hacking |
Describe the problem
Starting with an empty database and putting a test load onto the single node CockroachDB instance, after running for some minutes, it gets locked up.
No queries are answered, it does no disk IO, but eats two cores completely:
To Reproduce
./cockroach start --insecure "--store=path=/data/cockroachtest,rocksdb=compaction_readahead_size=4M" --listen-addr=$ip --http-addr=localhost:8080 --log-dir=/data/crdb/log
The node doesn't even respond to HTTP commands, so I can't get any info from there.
**Additional data / screenshots** Doing a trace on the process, it seems it does these endlessly:
The two threads eating the CPUs are:
I did a coredump and started gdb to see what are those:
Switching to 15 and doing a backtrace gives:
The other one is:
The thread (LWP) ids seem to be constant, so these two threads have an endless loop.
Goroutines:
Environment:
I'm not doing anything new, this workload could run of days previously. I've upgraded the server yesterday, so it may be the cause.
The text was updated successfully, but these errors were encountered: