You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we use Script.Run() it does what it should following the documentation:
Run optimistically uses EVALSHA to run the script. If script does not exist it is retried using EVAL.
That means is EVALSHA gets the NOSCRIPT error from redis, it retries. If the retry works, the caller is not receiving an error. However, Limiter.ReportResult does seem to get the initial error.
Possible Solution
When an error is non-persistent and automatically retried, do not send it to the limiter. Instead, send only the final result.
Steps to Reproduce
We implement a Limiter using sony's Circuit Breaker pattern here:
Simple Redis, versions 7.2, 7.4 and 8.0 with go-redis/v9 and Go 1.22.
Detailed Description
As a cache we don't want errors to interrupt business flow. Instead, we want only to log errors. We prefer to log errors where happen. For example, right after we call Script.Run(). We also have the goal to move to Go's standard slog package using context.Context aware logging. It is therefore not feasible to log errors in the Limiter.ReportResult function.
NOSCRIPT and perhaps other errors are handled internally by go-redis. When the error is not returned to the caller it is impossible to correlate it to some request if it does get send to Limiter.ReportResult. It is also a pain to debug, the application never received an error but the limiter / Circuit breaker did somehow trip.
We will include a check for NOSCRIPT prefix. However, it does make me worry there might be other, similar and undocumented errors which trigger an internal retry without the caller being aware.
# Which Problems Are Solved
When Zitadel starts the first time with a configured Redis cache, the
circuit break would open on the first requests, with no explanatory
error and only log-lines explaining the state of the Circuit breaker.
Using a debugger, `NOSCRIPT No matching script. Please use EVAL.` was
found the be passed to `Limiter.ReportResult`. This error is actually
retried by go-redis after a
[`Script.Run`](https://pkg.go.dev/github.com/redis/go-redis/v9@v9.7.0#Script.Run):
> Run optimistically uses EVALSHA to run the script. If script does not
exist it is retried using EVAL.
# How the Problems Are Solved
Add the `NOSCRIPT` error prefix to the whitelist.
# Additional Changes
- none
# Additional Context
- Introduced in: #8890
- Workaround for: redis/go-redis#3203
Expected Behavior
When
Script.Run
is called and doesn't return an error, don't pass an error toLimiter.ReportResult
.Current Behavior
When we use
Script.Run()
it does what it should following the documentation:That means is
EVALSHA
gets theNOSCRIPT
error from redis, it retries. If the retry works, the caller is not receiving an error. However,Limiter.ReportResult
does seem to get the initial error.Possible Solution
When an error is non-persistent and automatically retried, do not send it to the limiter. Instead, send only the final result.
Steps to Reproduce
We implement a
Limiter
using sony's Circuit Breaker pattern here:https://github.com/zitadel/zitadel/blob/77cd430b3a67cd95ad9deec1e5b44a17638def06/internal/cache/connector/redis/circuit_breaker.go#L1-L91
Context (Environment)
Simple Redis, versions 7.2, 7.4 and 8.0 with go-redis/v9 and Go 1.22.
Detailed Description
As a cache we don't want errors to interrupt business flow. Instead, we want only to log errors. We prefer to log errors where happen. For example, right after we call
Script.Run()
. We also have the goal to move to Go's standardslog
package usingcontext.Context
aware logging. It is therefore not feasible to log errors in theLimiter.ReportResult
function.NOSCRIPT
and perhaps other errors are handled internally by go-redis. When the error is not returned to the caller it is impossible to correlate it to some request if it does get send toLimiter.ReportResult
. It is also a pain to debug, the application never received an error but the limiter / Circuit breaker did somehow trip.The first errors we would get are literally:
Possible ImplementationworkaroundWe will include a check for
NOSCRIPT
prefix. However, it does make me worry there might be other, similar and undocumented errors which trigger an internal retry without the caller being aware.The text was updated successfully, but these errors were encountered: