NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

muhlemmer · 2024-12-06T16:58:22Z

Expected Behavior

When Script.Run is called and doesn't return an error, don't pass an error to Limiter.ReportResult.

Current Behavior

When we use Script.Run() it does what it should following the documentation:

Run optimistically uses EVALSHA to run the script. If script does not exist it is retried using EVAL.

That means is EVALSHA gets the NOSCRIPT error from redis, it retries. If the retry works, the caller is not receiving an error. However, Limiter.ReportResult does seem to get the initial error.

Possible Solution

When an error is non-persistent and automatically retried, do not send it to the limiter. Instead, send only the final result.

Steps to Reproduce

We implement a Limiter using sony's Circuit Breaker pattern here:

https://github.com/zitadel/zitadel/blob/77cd430b3a67cd95ad9deec1e5b44a17638def06/internal/cache/connector/redis/circuit_breaker.go#L1-L91

Context (Environment)

Simple Redis, versions 7.2, 7.4 and 8.0 with go-redis/v9 and Go 1.22.

Detailed Description

As a cache we don't want errors to interrupt business flow. Instead, we want only to log errors. We prefer to log errors where happen. For example, right after we call Script.Run(). We also have the goal to move to Go's standard slog package using context.Context aware logging. It is therefore not feasible to log errors in the Limiter.ReportResult function.

NOSCRIPT and perhaps other errors are handled internally by go-redis. When the error is not returned to the caller it is impossible to correlate it to some request if it does get send to Limiter.ReportResult. It is also a pain to debug, the application never received an error but the limiter / Circuit breaker did somehow trip.

The first errors we would get are literally:

time="2024-12-06T18:29:09+02:00" level=warning msg="circuit breaker state change" caller="/home/tim/Repositories/zitadel/zitadel/internal/cache/connector/redis/circuit_breaker.go:64" from=closed name="redis cache" to=open
time=2024-12-06T18:29:09.524+02:00 level=ERROR source=/home/tim/Repositories/zitadel/zitadel/internal/cache/connector/redis/redis.go:110 msg="redis cache get" index=orgIndexByID key=296967579065463350 err="circuit breaker is open"

Possible Implementation workaround

We will include a check for NOSCRIPT prefix. However, it does make me worry there might be other, similar and undocumented errors which trigger an internal retry without the caller being aware.

func (l *limiter) ReportResult(err error) {
	done := <-l.inflight
	done(err == nil ||
		errors.Is(err, redis.Nil) ||
		errors.Is(err, context.Canceled) ||
		redis.HasErrorPrefix(err, "NOSCRIPT"))
}

The text was updated successfully, but these errors were encountered:

# Which Problems Are Solved When Zitadel starts the first time with a configured Redis cache, the circuit break would open on the first requests, with no explanatory error and only log-lines explaining the state of the Circuit breaker. Using a debugger, `NOSCRIPT No matching script. Please use EVAL.` was found the be passed to `Limiter.ReportResult`. This error is actually retried by go-redis after a [`Script.Run`](https://pkg.go.dev/github.com/redis/go-redis/v9@v9.7.0#Script.Run): > Run optimistically uses EVALSHA to run the script. If script does not exist it is retried using EVAL. # How the Problems Are Solved Add the `NOSCRIPT` error prefix to the whitelist. # Additional Changes - none # Additional Context - Introduced in: #8890 - Workaround for: redis/go-redis#3203

muhlemmer mentioned this issue Dec 6, 2024

fix(cache): ignore NOSCRIPT errors in redis circuit breaker zitadel/zitadel#9022

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

muhlemmer commented Dec 6, 2024

NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

NOSCRIPT error is send to Limiter but Eval is automatically run anyway #3203

Comments

muhlemmer commented Dec 6, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Detailed Description

Possible Implementation workaround