-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: Fix bug where connection errors can cause early cache expiry #9979
Conversation
This mostly affects streaming since streaming will immediately return from Fetch calls when the state is Closed on eviction which causes the race condition every time. However this also affects all other cache types if the fetch call happens to return between the eviction and then next time around the Get loop by any client. There is a separate bug that allows cache items to be evicted even when there are active clients which is the trigger here.
…r cache entry is returning fetch errors.
@@ -1,3 +1,3 @@ | |||
```release-note:bug | |||
cache: fix a bug in the client agent cache where streaming could potentially leak resources. [[GH-9978](https://github.com/hashicorp/consul/pulls/9978)]. | |||
cache: fix a bug in the client agent cache where streaming could potentially leak resources. [[GH-9978](https://github.com/hashicorp/consul/pull/9978)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this in the upstream PR branch too and reset base but it's still showing here 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this seems like a really important fix.
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/347197. |
🍒✅ Cherry pick of commit 1406671 onto |
…9979) Fixes a cache bug where TTL is not updated while a value isn't changing or cache entry is returning fetch errors.
Based on #9978.
This fixes a bug in the agent cache where TTL expiry times are only updated when successful updates are received.
There are two different ways this can cause items to be prematurely evicted from the cache:
Get
requests will go round their loop seeing only errors and not updating the expiry time even though they are still interested (i.e. the value is still "alive").Currently we don't run into this because until streaming, all cache-types had long TTLs of 72 hours.
It's also self-recovering in that the active clients will just cause the cache value to be refetched anyway even though it was evicted. For typical RPC-based cache types the cost is minimal but for leaf certificates, streaming and other possible future types the eviction causes connections to drop and data to be lost and recomputed which negatively effects performance or requires unnecessary work.
For streaming, this causes dropped connections which then take some time to reconnect based on the timing of the client which can mean delayed event delivery.