-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Error message "concurrent query failed" #224
Comments
Hi, this error means that you tryed to run the same query at the same time (or the first query then the 2nd one while the first one is still running). Instead of running twice the same query into clickhouse, chproxy will run the first one then wait for the result of the first query to use the cached result from the 2nd one. But if the first one failed (for example because of a timeout), chproxy will return "concurrent query failed" for the 2nd query and sometimes it can still return the same anwser if you run query 2 after query one failed (it's a mechanism useful to avoid users doing the same query to often while clickhouse is under water) |
To follow up on the answer from @mga-chka , we can consider an enhancement to this mechanism by storing error message of originally failed query. It would be more intuitive to user and hide the technical detail on how we optimise execution of concurrent queries. I'm marking this issue as an enhancement and will try to work on it in a near future. |
I also stepped on it. For example, i'm setting 60s for the cache TTL and 30s for the
After 60s (cache TTL) i have 2 keys in redis:
Running the query again resulting to the error: "concurrent query failed" "2" means "unsuccessfull query" i guess, but somehow this result is not stored for the cache TTL (60s), but for 86400*2seconds, which is weird. I have 86400s Also, caching the error is not always a good idea, because error may happen for the different reasons: In this case, when client want to re-try the query after a while, it will constantly recieve the error without a chance to re-try, which is really bad. IMO, errors caching should be disabled, or, at least, there should be a setting to explicitly control this behavior. |
Hello @PhantomPhreak , Thanks for sharing. Keys with the suffix When Clickhouse responds upon the first query, the key with the
This information in redis is kept for
Very relevant point. We've come to similar reflections except we had thought that it is still worth to protect Clickhouse from fatal queries (unrecoverable). We've introduced this idea in #193 The concept is to not mark transaction as failed if received response from Clickhouse is recoverable.
Following your feedback, it may be indeed relevant to add http status codes representing timeout (408). Regarding memory limit it's more tricky to configure because it's not curried in a distinct http status code, but within 500. |
This is a good point, but there are a big question how to separate recoverable and unrecoverable queries. Also, if unrecoverable queries don't do any harm to the ClickHouse itself, it may be safe to allow them, and control the query execution rate by max requests per minute/max concurrent queries on CHproxy side, or even a quotas and query complexity settings in ClickHouse itself. We're doing that quite succefully with no problems.
I need to say that the scope of potential causes may be wider, and now, for example, query will be marked as failed if i just press Ctrl+C in IMO, if some query was failed, and another query came after that - it's not a concurrent queries, so this "new" query should not fail because of "concurrent query protection logic/etc.", just because it's not applicable in this case. These are sequential requests, not concurrent ones. I may miss some details in the implementation (sorry for the stupid proposals if any), but this behavior looks logical for me: all concurrent queries (waiting for the "grace_period") will fail if currently running query fails (with the error/timeout received from ClickHouse, not because of the user cancelled the query), but following queries (re-try) will be allowed to execute. |
Thanks for update Guys, I have tested release 1.17.2, at start I was unreadable logs, and I think it was caused by the http compression actived on Clickhouse : I have deactivated HTTP Compression on CH to have "readable" error but now I juste have "empty" error :/ : Have you an idea that he can be caused "empty" reporting on concurrency error ? NB: Nothing on Clickhouse logs :/ |
yep, this is the issue @PhantomPhreak was mentionning (with the ctrl +c). |
👍 |
FYI v1.18.0 should improve this issue since now a failing call will be cached for 500 msec. I'm closing this issue since the new behavior of chproxy (the root cause error of the "cached" error is shown & the error is cached only for 500 msec to avoid dogpile effect) should solve most cases. |
Describe the bug
I use chproxy with redis cache (keydb in fact, but it's the same), chproxy are used between clickhouse and dataviz (Grafana). But I have randomly the error "concurrent query failed" (and this error are cached :/).
This message are not really explicit, when I see the code the error was fired by the cache code. What can be the origin of this error and how I can solve it ?
Expected behavior
No error and request dont failed
Environment information
chproxy: 1.17.0
Clickhouse: 22.8.3.13
Keydb: v6.3.1
Kubernetes cluster
The text was updated successfully, but these errors were encountered: