-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import/tpch/nodes=4 failed #31740
Comments
No obvious smoking gun here. There aren't any known-terrible health alerts (a bit of replicaGC and consistency queue failures [failures - not found divergence]). The disk usage log actually shows the disk usage increasing until the test times out. Perhaps it just needs more time 🤷♂️ |
Some of the nodes have tons of slow commits (hundreds basically going back-to-back). |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=982644&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=983962&tab=buildLog
|
Last crash is a random logging bug fixed in #31822. Ignore. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=985074&tab=buildLog
|
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=987340&tab=buildLog
|
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
31914: storage: fix Raft log size accounting r=petermattis a=tschottdorf We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as #31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches #31732. Touches #31740. Touches #30261. Touches #31768. Touches #31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=991714&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=993605&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=995412&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=997430&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=999222&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1000678&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1002679&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1003388&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1004113&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1007980&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1008034&tab=buildLog
|
^- slow Raft requests all the way through. I'm gonna run this test with #31875 to see if this is fixed in it. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1008356&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1010736&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1012425&tab=buildLog
|
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1019281&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1028047&tab=buildLog
|
Oops, I disabled the range merge check for restore2TB, but not for the import tests. Will fix. |
Hopefully fixed for good by #32594. |
SHA: https://github.com/cockroachdb/cockroach/commits/d3f39a5cb60eb9cea64635f78e25f67a435057a5
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=980026&tab=buildLog
The text was updated successfully, but these errors were encountered: