-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: backupTPCC failed #84240
Comments
the test failed mid restore because the cluster lost connection to node 1, when node 1's vm killed the local cockroach process. The cockroach logs do not indicate why though. I don't think an OOM occurred given the latest line in its health log (seems like there's surplus memory; no heap profs):
1.journactl.txt, indicates the vm killed the cockroach binary:
One hint in the logs indicates that node 1 was not able to scatter a request to other nodes before the process got killed, which could suggest that the vm killed the binary due to networking issues? Maybe?
|
aha! found a panic in This panic occurs while restore span restoreSpanEntry is being iterated through in
The panic might be swallowing the the error returned, though it's curious that
|
roachtest.backupTPCC failed with artifacts on master @ 687171ac6c2cd9992486bb3b8c9d252ac95ca1cd:
Parameters: |
roachtest.backupTPCC failed with artifacts on master @ 88d3253301457ac57820e0f4a4fab8f74bf9f38b:
Parameters: |
roachtest.backupTPCC failed with artifacts on master @ e9ee21860458d997a8155734dc608cfcd050ef24:
Parameters: |
@jbowens @erikgrinaker The restore job failed because the I see the following in the go SDK, which suggest that EOF implies a graceful end to reading a file:
|
@jbowens Seems like we should interpret EOF like an exhausted iterator. Probably better to handle this at the Pebble layer, since it handles lower-level IO? |
Oops, missed the mention until now. An If I think we need to track down where the |
This PR refactors all call sites of ExternalSSTReader(), to support using the new PebbleIterator, which has baked in range key support. Most notably, this PR replaces the multiIterator used in the restore data processor with the PebbleSSTIterator. This patch is apart of a larger effort to teach backup and restore about MVCC bulk operations. Next, the readAsOfIterator will need to learn how to deal with range keys. Informs cockroachdb#71155 This PR addresses a bug created in cockroachdb#83984: loop variables in ExternalSSTReader were captured by reference, leading to roachtest failures (cockroachdb#84240, cockroachdb#84162). Informs #71155i Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181 Release note: none
This PR refactors all call sites of ExternalSSTReader(), to support using the new PebbleIterator, which has baked in range key support. Most notably, this PR replaces the multiIterator used in the restore data processor with the PebbleSSTIterator. This patch is apart of a larger effort to teach backup and restore about MVCC bulk operations. Next, the readAsOfIterator will need to learn how to deal with range keys. Informs cockroachdb#71155 This PR addresses a bug created in cockroachdb#83984: loop variables in ExternalSSTReader were captured by reference, leading to roachtest failures (cockroachdb#84240, cockroachdb#84162). Informs #71155i Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181 Release note: none
This PR refactors all call sites of ExternalSSTReader(), to support using the new PebbleIterator, which has baked in range key support. Most notably, this PR replaces the multiIterator used in the restore data processor with the PebbleSSTIterator. This patch is apart of a larger effort to teach backup and restore about MVCC bulk operations. Next, the readAsOfIterator will need to learn how to deal with range keys. Informs cockroachdb#71155 This PR addresses a bug created in cockroachdb#83984: loop variables in ExternalSSTReader were captured by reference, leading to roachtest failures (cockroachdb#84240, cockroachdb#84162). Informs #71155i Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181 Release note: none
84452: sql: add troubleshooting mode session variable r=THardy98 a=THardy98 Resolves: #84429 This change introduces a `troubleshooting_mode_enabled` session variable. When enabled, this session variable is intended to be used as a way to avoid performing additional work on queries, particularly when the cluster is experiencing issues/unavailability/failure. By default, this session variable is disabled. Currently, this session variable is only used to avoid collecting/emitting telemetry data. Release note (sql change): Introduce new `troubleshooting_mode_enabled` session variable, to avoid doing additional work on queries when possible (i.e. collection telemetry data). By default, this session variable is disabled. 84666: storageccl: use the new PebbleIterator in ExternalSSTReader r=erikgrinaker a=msbutler This PR refactors all call sites of ExternalSSTReader(), to support using the new PebbleIterator, which has baked in range key support. Most notably, this PR replaces the multiIterator used in the restore data processor with the PebbleSSTIterator. This patch is apart of a larger effort to teach backup and restore about MVCC bulk operations. Next, the readAsOfIterator will need to learn how to deal with range keys. Informs #71155 This PR addresses a bug created in #83984: loop variables in ExternalSSTReader were captured by reference, leading to roachtest failures (#84240, #84162). Informs #71155 Fixes: #84240, #84162, #84181 Release note: none Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com>
roachtest.backupTPCC failed with artifacts on master @ 571bfa3afb3858ae84d8a8fcdbb0a38e058402a5:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-17540
The text was updated successfully, but these errors were encountered: