Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/fuse/log=true,data=true failed #97013

Closed
cockroach-teamcity opened this issue Feb 12, 2023 · 3 comments
Closed

roachtest: disk-stalled/fuse/log=true,data=true failed #97013

cockroach-teamcity opened this issue Feb 12, 2023 · 3 comments
Assignees
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 12, 2023

roachtest.disk-stalled/fuse/log=true,data=true failed with artifacts on release-22.1 @ 96481c6193816b18cfb148eb21b4018f463d8565:

test artifacts and logs in: /artifacts/disk-stalled/fuse/log=true_data=true/run_1
(disk_stall.go:223).runDiskStalledDetection: post-stall TPS 774.37 is less than 50% of pre-stall TPS 1615.79
(cluster.go:1937).Run: cluster.RunE: context canceled
(cluster.go:1937).Run: cluster.RunE: context canceled
(cluster.go:1937).Run: output in run_094218.152765113_n4_cockroach_workload_run_kv: ./cockroach workload run kv --read-percent 50 --duration 10m --concurrency 256 --max-rate 2048 --tolerate-errors  --min-block-bytes=512 --max-block-bytes=512 {pgurl:1-3} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-24464

Epic CRDB-20293

@cockroach-teamcity cockroach-teamcity added branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 12, 2023
@cockroach-teamcity cockroach-teamcity added this to the 22.1 milestone Feb 12, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Feb 12, 2023
@nicktrav
Copy link
Collaborator

Failed due to:

post-stall TPS 774.37 is less than 50% of pre-stall TPS 1615.79

@nicktrav
Copy link
Collaborator

Looking at n1, it did crash as expected.

Looking at the workload stats, the math doesn't seem to line up with the error message. I'm looking more at the read and write rates, rather than the absolute numbers:

Pre-stall - appears to be ~2k:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s        0          928.8         1014.9      1.0      5.5     52.4     71.3 read
    1.0s        0          941.6         1028.9      3.4     21.0     62.9     83.9 write
    2.0s        0         1035.1         1025.0      1.0      1.6      5.8     11.0 read
    2.0s        0         1014.1         1021.5      3.1      6.3     11.0     17.8 write
    3.0s        0         1010.5         1020.1      1.0      1.5      2.6      7.3 read
    3.0s        0         1036.5         1026.4      3.1      6.3      9.4     14.7 write
    4.0s        0         1040.9         1025.3      1.0      1.4      2.4      5.8 read
    4.0s        0         1006.9         1021.6      3.1      6.0      8.9     14.7 write
    5.0s        0         1021.0         1024.5      1.0      1.3      2.1      7.6 read
    5.0s        0         1028.0         1022.9      3.1      6.6     11.0     28.3 write
    6.0s        0         1026.4         1024.8      1.0      1.5      2.5     11.5 read
    6.0s        0         1022.4         1022.8      3.1      6.3     13.1     23.1 write
    7.0s        0         1059.7         1029.8      1.0      1.6      6.8     21.0 read
    7.0s        0          982.7         1017.1      3.1      6.8     14.7     23.1 write
    8.0s        0         1002.3         1026.3      1.0      1.3      1.7      2.5 read
    8.0s        0         1051.3         1021.3      3.1      5.8      7.3     12.1 write
    9.0s        0         1038.2         1027.6      1.0      1.4      8.1     18.9 read
    9.0s        0         1009.2         1020.0      3.0      5.5     11.0     28.3 write
   10.0s        0         1019.7         1026.9      1.0      1.4      2.2      5.5 read
   10.0s        0         1025.7         1020.6      3.1      5.8      8.4     12.6 write

Post-stall - appears to be ~1.6k:

  |   | stdout:
  |   | <... some data truncated by circular buffer; go to artifacts for details ...>
  |   |  173325          681.0          830.5      1.0      1.6      1.9      4.5 read
  |   |   577.0s   173325         1020.0          983.6      2.6      5.0      6.6      9.4 write
  |   |   578.0s   174008          694.1          830.3      1.0      1.7      9.4     18.9 read
  |   |   578.0s   174008         1019.1          983.7      2.6      5.5      9.4     23.1 write
  |   |   579.0s   174690          682.9          830.0      1.0      1.6      2.6     11.0 read
  |   |   579.0s   174690         1035.8          983.8      2.8      5.8      9.4     15.2 write
  |   |   580.0s   175373          680.9          829.8      1.0      1.4      1.8      2.4 read
  |   |   580.0s   175373          999.8          983.8      2.8      5.2      7.1      9.4 write
  |   | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
  |   |   581.0s   176056          681.7          829.5      1.0      1.4      2.0      3.0 read
  |   |   581.0s   176056         1008.6          983.8      2.6      5.2      6.0      9.4 write
  |   |   582.0s   176738          660.4          829.2      1.0      1.8      3.4     12.6 read
  |   |   582.0s   176738         1054.6          984.0      2.6      6.3     12.1     17.8 write
  |   |   583.0s   177421          655.9          828.9      1.0      1.4      2.2      4.7 read
  |   |   583.0s   177421         1052.8          984.1      2.6      5.5      7.1      8.9 write
  |   |   584.0s   178104          654.2          828.6      1.0      1.4      1.6      2.2 read
  |   |   584.0s   178104         1072.4          984.2      2.6      5.2      6.6     11.5 write
  |   |   585.0s   178786          691.0          828.4      1.0      1.4      1.9      2.2 read
  |   |   585.0s   178786          995.1          984.3      2.6      5.2      6.0      8.9 write
  |   |   586.0s   179469          667.9          828.1      1.0      1.5      2.5     12.6 read
  |   |   586.0s   179469         1031.8          984.3      2.6      5.5      7.1     13.6 write
  |   |   587.0s   180152          680.0          827.9      1.0      1.4      2.1      8.9 read
  |   |   587.0s   180152         1025.1          984.4      2.6      5.2      6.8     17.8 write
  |   |   588.0s   180834          666.9          827.6      1.0      1.3      1.6      2.0 read
  |   |   588.0s   180834         1067.9          984.5      2.5      4.7      5.8      6.6 write
  |   |   589.0s   181517          718.0          827.4      0.9      1.3      1.7     10.0 read
  |   |   589.0s   181517          984.9          984.5      2.6      5.2      8.1     16.3 write
  |   |   590.0s   182200          662.7          827.1      0.9      1.3      1.7      2.4 read
  |   |   590.0s   182200         1035.6          984.6      2.6      5.5      6.6      8.9 write
  |   | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
  |   |   591.0s   182882          686.2          826.9      1.0      2.0      8.1     19.9 read
  |   |   591.0s   182882         1021.4          984.7      2.6      6.6     16.8     28.3 write
  |   |   592.0s   183565          658.8          826.6      1.0      1.2      1.8      2.4 read
  |   |   592.0s   183565         1050.7          984.8      2.6      5.2      6.6     10.5 write
  |   |   593.0s   184248          691.1          826.4      0.9      1.3      1.6      2.4 read
  |   |   593.0s   184248         1033.2          984.9      2.5      5.0      6.3     10.5 write
  |   |   594.0s   184931          663.3          826.1      1.0      1.3      1.8      2.4 read
  |   |   594.0s   184931         1051.5          985.0      2.5      5.0      6.3      9.4 write
  |   |   595.0s   185613          692.5          825.9      1.0      1.4      5.8     14.7 read
  |   |   595.0s   185613         1010.2          985.0      2.6      5.2      8.9     23.1 write
  |   |   596.0s   186296          663.0          825.6      0.9      1.4      1.9      3.1 read
  |   |   596.0s   186296         1027.0          985.1      2.6      5.5      6.6     10.5 write
  |   |   597.0s   186979          674.5          825.4      1.0      1.2      1.8      3.1 read
  |   |   597.0s   186979         1042.8          985.2      2.5      4.5      5.5      6.8 write
  |   |   598.0s   187661          676.7          825.1      1.0      1.2      1.6      2.2 read
  |   |   598.0s   187661         1032.6          985.3      2.5      5.0      6.0      8.9 write

@jbowens jbowens removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Feb 13, 2023
@nicktrav nicktrav self-assigned this Feb 25, 2023
@nicktrav
Copy link
Collaborator

Closed by #97667.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

3 participants