-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Proposal: Pool/Conn, handling connection timeouts. #2586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Upvote for proposal 3. That is how rueidis works for concurrent requests and enables higher throughput by pipelining. It will also make implementing client-side caching easier. |
@rueian tks.. Plan 3 requires us to make extensive code changes and conduct further testing. I tested Plan 1, and it provided some assistance in cases of minor fluctuations in the network. However, when the network environment changed from 1ms to 10ms, Plan 1 no longer made sense. I think maybe we should give Plan 3 a try. Additionally, I have been hesitant about the client-side cache you mentioned. I believe that the client-side cache consumes some resources of the Redis server, so we need to handle it with caution. We should make different considerations for different keys. For example, for frequently modified keys or complex data structures such as hashes and lists, we should not use caching because any change in any field of a hash would trigger an update of the cache. However, in some cases, users store static data such as order information in a hash. This hash table will not be modified in the business logic, so it is no different from a regular key and can be included in the cache. Last year, I conducted some tests, and tracking all keys in a project without considering their nature actually increased the burden on the Redis server. If we use a broadcasting mode, most clients receive notifications that are irrelevant to them. My suggestion is that we should provide a cache API or a lower-level API to assist users in implementing the caching they need. |
Totally agree with you. I also have the same concern about burdens added on servers or on clients by client-side caching. As a result, in rueidis, I made a separate API to allow users to only opt-in keys they want to cache locally, not all keys. |
Some idea to proceed with proposal 3 : We can have a command queue of some sort and instead of sending commands directly via the connection,we can put them into a queue. Each command could be a struct containing the command itself, a context (to handle timeouts at the command level), and a chan to send the result back. Then we can have a command dispatcher which may be a separate goroutine (or multiple goroutines for concurrency) responsible for dispatching commands from the queue to available connections. The dispatcher could select a connection from the pool, send the command, and then put the connection back into the pool. After sending a command, the dispatcher could start a goroutine to listen for the response on that connection. Once a response is received, it could be sent back to the caller via the go chan in the Command struct. Thoughts regarding timeout : there may still be edge cases to think about. (edit : Can we perhaps implement the client side caching disjoint from this ) |
Yes, my only concern is that Solution 3 requires a significant amount of code modification and testing, which could introduce instability factors. |
最近在压测项目时也碰到这个问题,出现大量的connection pool timeout,而且在链路追踪中显示有些命令执行时间超过3秒, |
I doubt this can be done, because the data can arrive a minute later after the timeout - see https://redis.uptrace.dev/guide/go-redis-debugging.html#large-number-of-open-connections and
This sounds a bit like
I believe you are trading latency for throughput here. Some users might like it, some might not. Also, it is a big change and timeouts will still happen so it is not 100% bulletproof solution. 🤷 |
Here, what I mean is that instead of closing the connection when a timeout occurs, we place it in a recycle bin where there are goroutines constantly performing health checks on the connection. In other words, we might keep more connections than the specified PoolSize.
Here, we utilize two goroutines, one for reading and one for writing, for each net.Conn. Each command is immediately sent without any delay, and we wait for the message results using channels. This approach requires an additional two channel operations for reading and writing. By doing this, it becomes more convenient to handle context and timeout waiting. |
Upvote for proposal 3 too, this proposal is just what http2 did, and the main difference between http2 is lack the function to blance all request on one connection. But, we still face the same problem because WATCH command need a dedicated connection to execute. |
What I mean is that you would need to keep the connection in recycle bin for at least one minute to make sure no unsolicited data will arrive. I'd assume it is too much time to be really useful...
Okay, without buffering/pipelining you don't get extra throughput :) The latency will be better, but each command will cost an additional Async execution with buffering/pipelining looks more interesting, but it should be optional / configurable since there are trade-offs / decisions to be made. Perhaps we might even make buffering optional... |
Agreed with @vmihailenco . The first is probably not valuable by the time it's occurred, and number 3 creates (even) higher memory usage, especially in highly scaled applications. I like retries + health checks, which are features available in other clients. Async execution is great - but there are several use cases where sync is vital, specifically data types and applications that are reliant on syncronous reads. I myself have this need today. |
I'm curious if option (3) is a version of what StackExchange.Redis has implemented with its approach to multiplexing. It seems to be a good idea for many applications except for things like the blocking calls like |
Recently, we have received many questions regarding creating a large number of connections in go-redis. When a connection is deemed no longer available due to context or write/read timeouts, we remove it from the connection pool. We do this without any issues because the connection may contain dirty read/write data, which could potentially contaminate subsequent command invocations.
However, when ReadTimeout and WriteTimeout are set to relatively small values and the network status with the redis-server is poor, a large number of command timeouts occur, leading go-redis to create numerous new connections, further worsening the network environment.
Adding a connection recycling mechanism and marking network connections, such as "timeout - unread residual data" (which is the most common error). We would place these connections into a recycle bin, where periodic checks are performed to read and handle any residual dirty data. Once the connection is confirmed to be usable, it would be placed back into the connection pool. However, this approach would maintain a higher number of network connections than the value specified by pool.Poolsize, which might surprise users. Additionally, its effectiveness is limited when the network conditions are severely degraded.
Adding a network connection firewall that prevents the creation of new connections when a widespread connection rebuilding is detected, periodic health checks would be initiated. It is also possible to limit the creation of new connections by periodically injecting a certain number of healthy connections into the connection pool (similar to a token bucket algorithm).
We changing the mechanism of go-redis command read/write by implementing a read/write separation approach. Network connections would no longer have timeout significance, and the execution commands and expected results would be passed through mechanisms like go-chan. In other words, the connection becomes a stateless transport device.
The text was updated successfully, but these errors were encountered: