Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Fix k8s service registration case where Vault fails to unlabel itself as a leader into release/1.13.x #21885

Conversation

hc-github-team-secure-vault-core
Copy link
Collaborator

Backport

This PR is auto-generated from #21642 to be assessed for backporting due to the inclusion of the label backport/1.13.x.

The below text is copied from the body of the original PR.


This bug only ever occurs on enterprise, as we only ever call sealInternalWithOptions with keepHALock as true in enterprise, and so keepHALockOnStepDown is always 0 in OSS. When we step down as a leader but keep the HA lock, we should still unlabel ourselves as leader in k8s, but that happens in clearLeader, so before this fix if we keep the HA lock we'll never unlabel ourselves. Essentially, this change ensures we more closely track the core's standby state variable in that case.

I'm not sure about automated testing yet. I've been using the following script for reproducing the issue locally.

Repro script

#!/usr/bin/env bash
# 27 Jul 2022 - Sean Ellefson
# https://hashicorp.zendesk.com/agent/tickets/79606
#
# This script attempts to reproduce an issue with Kubernetes service
# registration where the 'vault-active' label some times doesn't get updated,
# resulting in more than pod with the label 'vault-active=true'.  It doesn't
# always work, but seems to occur more reliably than other methods when hitting
# the 'update-primary' endpoint
#
# This assumes you have Helm installed and configured with the HashiCorp
# repository as well as a local Kubernetes environment (built with minikube),
# and `jq`.  You'll also need to ensure you have your Vault Enterprise license
# created as a k8s secret with the key-name "vault.hclic".
#
# The script deploys a Vault dev server, configures the Transit secrets engine,
# and enables DR primary replication.  It then deploys a Vault Raft cluster,
# using the dev server for Transit auto-unseal, and enables DR secondary
# replication, and then uses the 'update-primary' endpoint.  You can then loop
# at the end of the script to repeatedly submit secondary tokens to the
# 'update-primary' endpoint until the issue occurs

# Arbitrary amount of iterations to generate activity on primary cluster, seems
# to help issue to recur
WAL_ITERATIONS=50 

# Colors because the world is a colorful place 🌎
TXTBLU="$(tput setaf 4)"
TXTCYA="$(tput setaf 6)"
TXTGRN="$(tput setaf 2)"
TXTMGT="$(tput setaf 5)"
TXTRED="$(tput setaf 1)"
TXTYLW="$(tput setaf 3)"
TXTWHT="$(tput setaf 7)"
TXTRST="$(tput sgr0)"

msg() {
    MSGSRC="[repro-79606]"
    MSGTYPE="$1"
    MSGTXT="$2"
    case "${MSGTYPE}" in
        greeting)
            printf "%s%s [=] %s %s\\n" "$TXTBLU" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        info)
            printf "%s%s [i] %s %s\\n" "$TXTCYA" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        success)
            printf "%s%s [+] %s %s\\n" "$TXTGRN" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        complete)
            printf "%s%s [^] %s %s\\n" "$TXTGRN" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        boom)
            printf "%s%s [*] %s %s\\n" "$TXTMGT" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        notice)
            printf "%s%s [?] %s %s\\n" "$TXTYLW" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        alert)
            >&2 printf "%s%s [!] %s %s\\n" "$TXTRED" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
        *)
            >&2 printf "%s%s [@] %s %s\\n" "$TXTCYA" "$MSGSRC" "$MSGTXT" "$TXTRST"
            ;;
    esac
}

trap cleanup SIGINT

cleanup() {
  msg alert "Caught interrupt!  Cleaning up..."
  helm uninstall vault-primary vault-secondary
  kubectl delete pvc data-vault-secondary-{0..2}
  if ps -p $SECONDARY_0_LOG_PID > /dev/null 2>&1 ; then kill -9 $SECONDARY_0_LOG_PID ; fi
  if ps -p $SECONDARY_1_LOG_PID > /dev/null 2>&1 ; then kill -9 $SECONDARY_1_LOG_PID ; fi
  if ps -p $SECONDARY_2_LOG_PID > /dev/null 2>&1 ; then kill -9 $SECONDARY_2_LOG_PID ; fi
  msg notice "Exiting..."
  exit
}


# Capture logs from secondary pods
msg info "Creating directory './repro-79606-logs'"
mkdir -p ./repro-79606-logs

# Dev server to be used as Transit auto-unseal target and replication primary
msg info "Deploy primary server "
helm install vault-primary hashicorp/vault \
  --set=server.dev.enabled=true \
  --set=server.dev.devRootToken=root \
  --set=server.standalone.enabled=true \
  --set=server.image.repository=hashicorp/vault-enterprise \
  --set=server.image.tag=1.14.0-ent \
  --set=server.enterpriseLicense.secretName=vault-license \
  --set=server.enterpriseLicense.secretKey=vault.hclic \
  --set=server.extraArgs="-dev-ha -dev-transactional" \
  --set=injector.enabled=false \
  --set=global.tlsDisable=true > /dev/null 

msg info "Wait until pod is ready"
until [ $(sleep 1 ; kubectl get pod vault-primary-0 -o json | jq .status.containerStatuses[].ready) == "true" ] 2> /dev/null ; do 
  sleep 2
done

msg info "Enable DR primary replication, prepare transit auto-unseal"
kubectl exec -it vault-primary-0 -- vault login root 

kubectl exec -it vault-primary-0 -- vault secrets enable transit
kubectl exec -it vault-primary-0 -- vault write -f transit/keys/autounseal
kubectl exec -it vault-primary-0 -- sh -c 'vault policy write autounseal - << EOF
path "transit/encrypt/autounseal" {
   capabilities = [ "update" ]
 }

 path "transit/decrypt/autounseal" {
    capabilities = [ "update" ]
  }
EOF'
TRANSIT_TOKEN=$(kubectl exec -it vault-primary-0 -- vault token create -format=json -policy="autounseal" | jq -r .auth.client_token)

kubectl exec -it vault-primary-0 -- sh -c 'vault policy write dr-secondary-promotion - <<EOF
path "sys/replication/dr/secondary/promote" {
  capabilities = [ "update" ]
}

path "sys/replication/dr/secondary/update-primary" {
    capabilities = [ "update" ]
  }

path "sys/storage/raft/autopilot/state" {
    capabilities = [ "update" , "read" ]
  }

path "sys/storage/raft/configuration" {
    capabilities = [ "read" ]
  }
EOF'
kubectl exec -it vault-primary-0 -- vault write auth/token/roles/failover-handler \
    allowed_policies=dr-secondary-promotion \
    orphan=true \
    renewable=false \
    token_type=batch
DR_TOKEN=$(kubectl exec -it vault-primary-0 -- vault token create --format=json -role=failover-handler -ttl=8h | jq -r .auth.client_token)

kubectl exec -it vault-primary-0 -- vault write -f sys/replication/dr/primary/enable 

# Raft secondary cluster required for reproducing issue
msg info "Deploy secondary cluster "
helm install vault-secondary hashicorp/vault \
  --set=server.affinity='' \
  --set=server.ha.enabled=true \
  --set=server.ha.raft.enabled=true \
  --set=server.ha.raft.replicas=3 \
  --set=server.image.repository=hashicorp/vault-enterprise \
  --set=server.image.tag=1.14.0-ent \
  --set=server.enterpriseLicense.secretName=vault-license \
  --set=server.enterpriseLicense.secretKey=vault.hclic \
  --set=server.logLevel=trace \
  --set=injector.enabled=false \
  --set=global.tlsDisable=true \
  --set=server.extraEnvironmentVars.VAULT_TOKEN=$TRANSIT_TOKEN \
  --set-string='server.ha.raft.config=
ui = true

service_registration "kubernetes" {}

listener "tcp" {
  address = ":8200"
  cluster_address = ":8201"
  tls_disable = 1
  telemetry {
    unauthenticated_metrics_access = true
  }
}

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
}

seal "transit" {
  address = "http://vault-primary-0.vault-primary-internal:8200"
  key_name = "autounseal"
  mount_path = "transit"
}

storage "raft" {
  path = "/vault/data"
  retry_join {
    leader_api_addr = "http://vault-secondary-0.vault-secondary-internal:8200"
  }
  retry_join {
    leader_api_addr = "http://vault-secondary-1.vault-secondary-internal:8200"
  }
  retry_join {
    leader_api_addr = "http://vault-secondary-2.vault-secondary-internal:8200"
  }
}
' > /dev/null

msg info "Wait until cluster has started"
sleep 3
until [ $(sleep 2 ; kubectl get pod vault-secondary-0 -o json | jq .status.containerStatuses[].started) == "true" ] 2> /dev/null ; do 
  sleep 1
done
kubectl logs vault-secondary-0 -f > ./repro-79606-logs/vault-secondary-0.log & SECONDARY_0_LOG_PID=$!
until [ $(sleep 2 ; kubectl get pod vault-secondary-1 -o json | jq .status.containerStatuses[].started) == "true" ] 2> /dev/null ; do 
  sleep 1
done
kubectl logs vault-secondary-1 -f > ./repro-79606-logs/vault-secondary-1.log & SECONDARY_1_LOG_PID=$!
until [ $(sleep 2 ; kubectl get pod vault-secondary-2 -o json | jq .status.containerStatuses[].started) == "true" ] 2> /dev/null ; do 
  sleep 1
done
kubectl logs vault-secondary-2 -f > ./repro-79606-logs/vault-secondary-2.log & SECONDARY_2_LOG_PID=$!

msg info "Initialize secondary cluster and start replication"
until [ $ROOT ] ; do
  read -r UNSEAL ROOT < <(kubectl exec -it vault-secondary-0 -- vault operator init --format=json -recovery-shares=1 -recovery-threshold=1 | jq -r '.recovery_keys_b64[], .root_token' | xargs echo -n)
done

until [ $(sleep 2 ; kubectl exec -it vault-secondary-0 -- curl http://localhost:8200/v1/sys/health | jq -r .standby) == "false" ] 2> /dev/null ; do 
  sleep 1
done

# Setting Transit auto-unseal token as env var requires unsetting VAULT_TOKEN
# before being able to make authenticated requests from within the pod
kubectl exec -it vault-secondary-0 -- vault login $ROOT 
SECONDARY_TOKEN=$(kubectl exec -it vault-primary-0 -- vault write -f --format=json sys/replication/dr/primary/secondary-token id=dr | jq -r .wrap_info.token)
kubectl exec -it vault-secondary-0 -- sh -c "unset VAULT_TOKEN ; vault write -f sys/replication/dr/secondary/enable token=$SECONDARY_TOKEN"

msg info "Wait until cluster is ready"
until [ $(sleep 2 ; kubectl get pod vault-secondary-0 -o json | jq .status.containerStatuses[].ready) == "true" ] 2> /dev/null ; do 
  sleep 1
done
until [ $(sleep 2 ; kubectl get pod vault-secondary-1 -o json | jq .status.containerStatuses[].ready) == "true" ] 2> /dev/null ; do 
  sleep 1
done
until [ $(sleep 2 ; kubectl get pod vault-secondary-2 -o json | jq .status.containerStatuses[].ready) == "true" ] 2> /dev/null ; do 
  sleep 1
done

# Checkpoint, shows correctly labelled active node
msg info "Show leader"
date ; msg success "kubectl get pods -l vault-active=true"
kubectl get pods -l vault-active=true

msg info "Generate some WALs..."
for i in $(seq 1 $WAL_ITERATIONS) ; do 
  kubectl exec -it vault-primary-0 -- vault token create -policy=default > /dev/null
  kubectl exec -it vault-primary-0 -- vault kv put secret/$i foo=bar > /dev/null
  echo -n "." 
done
echo

# May require submitting more than one secondary token to reproduce issue
reproduce_issue() {
  msg info "Generate new secondary token, hit update-primary and reproduce issue"
  kubectl exec -it vault-primary-0 -- vault write -f --format=json sys/replication/dr/primary/revoke-secondary id=dr
  SECONDARY_TOKEN=$(kubectl exec -it vault-primary-0 -- vault write -f --format=json sys/replication/dr/primary/secondary-token id=dr | jq -r .wrap_info.token)
  kubectl exec -it vault-secondary-0 -- vault write -f sys/replication/dr/secondary/update-primary token=$SECONDARY_TOKEN dr_operation_token=$DR_TOKEN

  msg info "Show leader"
  sleep 20
  msg alert "kubectl get pods -l vault-active=true"
  date ; kubectl get pods -l vault-active=true 
}

while : ; do 
  reproduce_issue 
  read -p "Press enter key to attempt reproduction again, Ctrl+C to cleanup and exit: "
done

cleanup


Overview of commits

@hc-github-team-secure-vault-core hc-github-team-secure-vault-core force-pushed the backport/vault-7375/multiple-pods-labelled-leader/adversely-knowing-seal branch 2 times, most recently from 817fca1 to d7aef32 Compare July 17, 2023 12:43
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jul 17, 2023
@tomhjp tomhjp enabled auto-merge (squash) July 17, 2023 12:43
@tomhjp tomhjp added this to the 1.13.5 milestone Jul 17, 2023
@tomhjp tomhjp merged commit 0c731be into release/1.13.x Jul 17, 2023
@tomhjp tomhjp deleted the backport/vault-7375/multiple-pods-labelled-leader/adversely-knowing-seal branch July 17, 2023 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants