Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Token Tidy #10444

Open
ilyas28 opened this issue Nov 25, 2020 · 3 comments
Open

Error on Token Tidy #10444

ilyas28 opened this issue Nov 25, 2020 · 3 comments

Comments

@ilyas28
Copy link

ilyas28 commented Nov 25, 2020

Describe the bug
When I call token tidy endpoint, I see error in vault server logs. What does it mean?

Server logs:

2020-11-25T08:26:23.222Z [INFO]  token: beginning tidy operation on tokens
2020-11-25T08:26:26.847Z [INFO]  token: checking if accessors contain valid tokens: progress=500
2020-11-25T08:26:30.587Z [INFO]  token: checking if accessors contain valid tokens: progress=1000
2020-11-25T08:26:33.381Z [DEBUG] auth.aws.auth_aws-ec2_1ecff0ea.initialize: starting initialization
2020-11-25T08:26:36.394Z [INFO]  token: checking if accessors contain valid tokens: progress=1500
2020-11-25T08:26:38.163Z [INFO]  token: number of entries scanned in parent prefix: count=1
2020-11-25T08:26:38.163Z [INFO]  token: number of entries deleted in parent prefix: count=0
2020-11-25T08:26:38.163Z [INFO]  token: number of tokens scanned in parent index list: count=15
2020-11-25T08:26:38.163Z [INFO]  token: number of tokens revoked in parent index list: count=0
2020-11-25T08:26:38.163Z [INFO]  token: number of accessors scanned: count=1768
2020-11-25T08:26:38.163Z [INFO]  token: number of deleted accessors which had empty tokens: count=0
2020-11-25T08:26:38.163Z [INFO]  token: number of revoked tokens which were invalid but present in accessors: count=0
2020-11-25T08:26:38.163Z [INFO]  token: number of deleted accessors which had invalid tokens: count=0
2020-11-25T08:26:38.163Z [INFO]  token: number of deleted cubbyhole keys that were invalid: count=0
2020-11-25T08:26:38.163Z [INFO]  token: finished tidy operation on tokens
2020-11-25T08:26:38.163Z [ERROR] token.tidy: error running tidy: error="1 error occurred:
    * failed to read the accessor index: failed to read index using accessor: decryption failed: cipher: message authentication failed
"

To Reproduce
Steps to reproduce the behavior:

curl --header "X-Vault-Token: $VAULT_TOKEN" --request POST "${VAULT_ADDR}/v1/auth/token/tidy"

Expected behavior
I expect the tidy operation to finish successfully.

Environment:

  • Vault Server Version (retrieve with vault status): 1.4.3
  • Vault CLI Version (retrieve with vault version):1.6.0
  • Server Operating System/Architecture: official docker image "vault:1.4.3"

Vault server configuration file(s):

disable_mlock = true
ui = true
listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_cert_file = "/vault/userconfig/vault-server-tls/tls.crt"
  tls_key_file = "/vault/userconfig/vault-server-tls/tls.key"
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}
storage "dynamodb" {
  ha_enabled = "true"
  region     = "us-east-1"
  table      = ------
}
max_lease_ttl = "87600h"
log_level = "debug"
telemetry {
  prometheus_retention_time = "1h"
  disable_hostname = true
}

Additional context
I have multiple clusters running. Only this one printed error during tidy operation, running almost the same configurations.

@Erfankam
Copy link

How can I assign this issue to myself?

@mikalai-hryb
Copy link

mikalai-hryb commented Nov 5, 2021

My team have faced the same issue

2021-11-05T11:31:39.667Z [ERROR] token.tidy: error running tidy:
  error=
  | 1 error occurred:
  | 	* failed to read the accessor index: failed to read index using accessor: decryption failed: cipher: message authentication failed
  | 

Some context:
We migrated from v1.2.3 with Consul to v1.8.3 with Raft in K8S
The data migration:

  • Dev ~1.4GB (yes, 1.4 GigaByte for some reason)
  • QA ~7MB
  • Prod ~7MB (we have not migrated it yet)

The Dev Vault took about 5 minutes to start in K8S
The QA Vault took about 25 seconds to start in K8S

Present time
We decided to fix this storage issue

  1. I found out that we had ~600.000 in Dev vs ~500 in QA accessors and tokes
  2. Noticed one thing. When I requested info about an accessor I got the permission denied error (I was using the root token) but when I requested info about this accessor again I got the invalid accessor error. After a few calls, both the number of accessors and tokens were getting smaller.
  3. Made ~600.000 calls to Dev Vault
  4. The number of assessors and tokens was reduced to ~500 but there were still 29 assessors with the permission denied error (nothing helped)
  5. The storage space remained the same - 1.4GB
  6. Made /v1/auth/token/tidy and got the error
    vault-server-1-server-operational log — mhryb 2021-11-05 13-13-15
  7. We still have 1.4GB Raft storage space but it takes 20-30 secs to restart the Vault (previously it was about 5 mins)
  8. Have restarted the Vault (all 3 servers at the same time) in order to check whether unseal would help. It has not helped, the Vault has been unsealed successfully and it's running.

@merickso
Copy link

We see the same message in the warning field when we list tokens using the api: "/v1/auth/token/accessors?list=true"
the response has a warning fields with many messages similar to this:
Found an accessor entry that could not be successfully decoded; associated error is "failed to read index using accessor: decryption failed: cipher: message authentication failed"
In some cases, it also says "context canceled". We are using a root token and have never run "token tidy" since the warnings are a bit scary
https://developer.hashicorp.com/vault/api-docs/auth/token#tidy-tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants