-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import/tpch/nodes=8 failed on master #30261
Comments
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=970685&tab=buildLog
|
@andreimatei same story as with #31184. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=972821&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=974325&tab=buildLog
|
Thought this might be related to #31618, but now the failure above is release-2.1 which doesn't have the new Raft-based limiting of the uncommitted log. |
The most recent failure has a bunch of stuck Raft proposals and rejected Raft snapshots. Merges are on, so I don't think it's worth spending much time on this today due to the known problems with these symptoms about to be fixed in #31570. On master, we instead (or rather, additionally) have #31618 which poses a problem to this test. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=974812&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=977057&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=978508&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=980026&tab=buildLog
|
This has |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=982644&tab=buildLog
|
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
31914: storage: fix Raft log size accounting r=petermattis a=tschottdorf We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as #31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches #31732. Touches #31740. Touches #30261. Touches #31768. Touches #31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
FWIW, I still saw this soft-fail with #31914 in. There was a stuck Raft proposal and a slow lease (probably the same thing) for quite a while and then it went away. Likely that this is because of the race in #31875, perhaps exacerbated by #31732 (comment). Going to keep pulling on these threads until these tests reliably pass without requiring any raft snapshots. There is lots of dysfunctionality in these parts of the code. |
Picked one of the handful of stuck ranges at random, here's the log. We see that n7 tries to send a Raft snapshot to a follower twice, but doesn't manage to within the 1min deadline (probably blocked on the semaphore on the receiver since other snapshots are going on). Takes around ~15 minutes until a snapshot goes through, totaling ~23m of outage:
The problem in #31732 (comment) probably also plays a role. The log is full of "intersecting range" failed snapshots, though not referring to this range:
Note how this really really spikes while the request we're looking at is stuck. Pretty sure some of these "unsnapshottable" ranges are crowding out the ranges that really need the snapshot to unblock this mess.
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=990073&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=991714&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=993605&tab=buildLog
|
^- also had #31988. |
Have this reproing again. Ready for some (familiar) badness? Merges are off and I'm running off master with a few debuggability PRs. I have a few stuck Raft proposals. One is a
Getting stuck in the middle of a This range is stuck in the middle of a replication change. It has four replicas (which the "healthy" replicas agree on), but two of them never even got initialized, i.e. they're basically just Raft groups with no data. This must mean that the preemptive snapshots sent out during upreplication got GC'ed away, making the Raft snapshot necessary. Thanks to the same kind of badness as in (the comment burst starting at) #31409 (comment), getting a raft snapshot isn't easy. Interestingly, in this run n8 is the only node that reports a constant stream of Raft snapshot failures (again the "intersecting existing range" flavor). And we're also seeing again that the snapshots take ages (I have written about this elsewhere, this is 99% because of the semaphore we use at the receiver which limits the total number of incoming snapshots but can really back up when there are 10 other nodes that all want to send snapshots).
|
So here the two main problems are a) that the preemptive snapshots were removed (gc'ed) before the upreplication had completed (semi-tracked in #31875, I'll have to fix this problem first before that PR merges) b) the Raft snapshot queue being mostly LIFO and c) that the three member raft group with one replica requiring a snapshot decided that it was a good idea to add a fourth replica (at which point it's losing quorum). |
and d) the semaphore on the receiver that can back up all too easily. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=995412&tab=buildLog
|
Another run, similar problems (my branch has the fix for Raft snapshot queue starvation described in #31947 already). We (again) have a range with four replicas and two of them need a snapshot. It looks like the main problem is that the most recently added replica had its preemptive snapshot GC'ed before it had a chance to catch up, and isn't receiving a Raft snapshot because $UNKNOWN_REASONS. It's interesting to note that n1 is uninitialized. Its replicaID is 2 which means that it hasn't been mucked with since the range was created (i.e. split off from its left neighbor). The fact that it's uninitialized means that the left neighbor jumped past the split trigger via a snapshot (perhaps something that would be avoided with #31875), which implicitly wipes the data the RHS could otherwise pick up, leaving it needing a snapshot. Ah, I'm seeing from the logs that it's "the other" scenario that can lead to this outcome: our LHS was moved off n1 right after the split, giving the replica no time to execute the split trigger:
There's definitely still lots to fix here. I wonder if something changed in import/restore that rebalances/scatters much more aggressively because it definitely feels like these problems have gotten worse (though there's no doubt that they existed for a long time). But perhaps that's just a subjective feeling. |
What the? Logspying on the raft snapshot queue on n5 (which is backed up with ~200 replicas), I'm seeing r3064 be processed (with a failure), then readded and immediately processing again, despite running with #32053. I thought for a bit that this is because of the "requeue" mechanism but that too removes and re-adds to the heap, so it should do the right thing.
|
Ha. User error. The difference between |
Wait, what?! Does that imply all the slowness is a result of compiling RocksDB w/o optimizations? |
No, I was using a binary that had less fixes than I thought. |
Oh, |
Exactly. Luckily this only happened in today's runs and as a bonus, this last comment here actually pretty definitely proves that #32053 will make the badness we're seeing a lot less bad. Curious what will be left over, my repros are running again. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=997430&tab=buildLog
|
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
We were accounting for sideloaded payloads (SSTs) when adding them to the log, but were omitting them during truncations. As a result, the tracked raft log size would permanently skyrocket which in turn would lead to extremely aggressive truncations and resulted in pathological amounts of Raft snapshots. I'm still concerned about this logic as we're now relying on numbers obtained from the file system to match exactly a prior in-mem computation, and there may be other bugs that cause a divergence. But this fixes the blatant obvious one, so it's a step in the right direction. The added tests highlight a likely omission in the sideloaded storage code which doesn't access the file system through the RocksDB env as it seems like it should, filed as cockroachdb#31913. At this point it's unclear whether it fixes the below issues, but at the very least it seems to address a major problem they encountered: Touches cockroachdb#31732. Touches cockroachdb#31740. Touches cockroachdb#30261. Touches cockroachdb#31768. Touches cockroachdb#31745. Release note (bug fix): avoid a performance degradation related to overly aggressive Raft log truncations that could occur during RESTORE or IMPORT operations.
SHA: https://github.com/cockroachdb/cockroach/commits/5a566d893d5b25100f2dd77d3c1ac8dcc674f677
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=906668&tab=buildLog
The text was updated successfully, but these errors were encountered: