-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpccbench/nodes=3/cpu=16 failed #39013
Comments
Node 3 was killed by the OOM killer, which is also what we see in #37163. I'm running three versions of that roachtest now to see if anything obvious jumps out. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399004&tab=buildLog
|
This last failure was #39022. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1411161&tab=buildLog
|
Very likely #39022. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1447036&tab=buildLog
|
We seem to be hitting issues with the roachtest coordinator node running out of space. I think this is because we're not deleting the test logs from passing tests anymore. I opened https://github.com/scaledata/rksql/pull/4 to address the biggest offender of this test log bloat, but I think the real fix is to go back and delete the testing logs themselves. |
I think #39810 is also the cause or at least a contributor to this problem. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1494403&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
Node 3 crashed due to a nil pointer dereference:
@jordanlewis could someone on SQL execution take a look at this? It looks like an issue in |
This looks like #39350 (see #36570 (comment)), but it's not the same. |
Thanks. Will take a look. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1495889&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496043&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496041&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
@yuzefovich could this be related to the refactor we did recently? Maybe not, but would be good to double check if you have a moment. |
Hm, I think it is possible but unlikely. I'll take a look tomorrow. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496387&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496385&tab=artifacts#/tpccbench/nodes=3/cpu=16
|
I'm adding this to the release blocker list. Thanks @nvanbenschoten for uncovering this issue. |
Yes, I'm at fault here (although it was not epic distsqlrun refactor, rather CFetcher refactor). It should be an easy fix. |
SHA: https://github.com/cockroachdb/cockroach/commits/7dab0dcfd37c389af357c302c073b9611b5ada25
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1398203&tab=buildLog
The text was updated successfully, but these errors were encountered: