Skip to content

Commit

Permalink
Flakiness fix for Upgrade Downgrade Backup Manual test (vitessio#9957)
Browse files Browse the repository at this point in the history
* feat: wait for the replica to finish restoring before calling InitShardPrimary to ensure it sets semi-sync correctly

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: have a common timeout for all the tablets instead of 300 per tablet

Signed-off-by: Manan Gupta <manan@planetscale.com>
  • Loading branch information
GuptaManan100 authored and DAlperin committed Mar 25, 2022
1 parent a6fdb1c commit 2638d1d
Showing 1 changed file with 27 additions and 5 deletions.
32 changes: 27 additions & 5 deletions examples/local/backups/restart_tablets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,41 @@ for i in 100 101 102; do
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ./scripts/vttablet-up.sh
done
sleep 5
vtctldclient InitShardPrimary --force commerce/0 zone1-100

for i in 200 201 202; do
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
SHARD=-80 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done
sleep 5
vtctldclient InitShardPrimary --force customer/-80 zone1-200

for i in 300 301 302; do
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done
sleep 5
vtctldclient InitShardPrimary --force customer/80- zone1-300

# Wait for all the replica tablets to be in the serving state before initiating
# InitShardPrimary. This is essential, since we want the RESTORE phase to be
# complete before we start InitShardPrimary, otherwise we end up reading the
# tablet type to RESTORE and do not set semi-sync, which leads to the primary
# hanging on writes.
totalTime=300
for i in 101 201 301; do
while [ $totalTime -gt 0 ]; do
status=$(curl "http://$hostname:15$i/debug/status_details")
echo "$status" | grep "REPLICA: Serving" && break
totalTime=$((totalTime-1))
sleep 0.1
done
done

# Check that all the replica tablets have reached REPLICA: Serving state
for i in 101 201 301; do
status=$(curl "http://$hostname:15$i/debug/status_details")
echo "$status" | grep "REPLICA: Serving" && continue
echo "tablet-$i did not reach REPLICA: Serving state. Exiting due to failure."
exit 1
done

vtctldclient InitShardPrimary --force commerce/0 zone1-100
vtctldclient InitShardPrimary --force customer/-80 zone1-200
vtctldclient InitShardPrimary --force customer/80- zone1-300

0 comments on commit 2638d1d

Please sign in to comment.