-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redis: MULTI EXEC causes crash during mirroring mode #26342
Comments
MULTI works by establishing a dedicated connection to the upstream Redis node. I wonder whether the crash happens because there is no dedicated connection for the mirror cluster. @asheryerm Could you please take a look? Thanks! |
I will take a look |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
@asheryerm have you been able to take a look? want to make sure this doesn't go stale |
I have looked at, but I was on holiday in the last months.
I will fix this in the coming days.
From: Andrew Garrett ***@***.***>
Reply-To: envoyproxy/envoy ***@***.***>
Date: Thursday, 27 April 2023 at 16:04
To: envoyproxy/envoy ***@***.***>
Cc: "Yermiyahu, Asher" ***@***.***>, Mention ***@***.***>
Subject: Re: [envoyproxy/envoy] redis: MULTI EXEC causes crash during mirroring mode (Issue #26342)
@asheryerm<https://github.com/asheryerm> have you been able to take a look?
want to make sure this doesn't go stale
—
Reply to this email directly, view it on GitHub<#26342 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ATCHXDBZ6Q34HCVAEUSAKL3XDJVEDANCNFSM6AAAAAAWGZX7PI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
@garrett528 Can you please add the envoy configuration that you used? |
@asheryerm it was something like
|
@garrett528 Hi Andrew, I've merged a fix. Can you please check if it fixes your problem? Thanks. |
@asheryerm thanks for taking a look. i was able to test it this morning. the MULTI EXEC mirrors properly for a single command but fails silently when trying to send 2 or more commands in the transaction. for example:
|
@garrett528 This silent failure is actually the expected behavior. When you run a transaction on a stand-alone redis node (cluster mode disabled) you can refer to any key within the transaction. When you run a transaction on a cluster (cluster mode enabled) all the keys referred to within the transaction need to map to the same hashslot. In this case, "foo", and "bar" do not map to the same slot, so the transaction fails on the cluster. You can try it with two or more commands that map to the same slot and you will see that it works. This is a limitation of Redis Cluster, and not of the transaction implementation in Envoy. Redis Cluster requires multi-key operations to map to the same hashslot. In Envoy we discard the responses of mirrored clusters, which is why you are unable to see the error and it is "silent". The transaction is succeeding on the stand-alone node and failing on the cluster. If you run the same commands with 'foo' and 'bar{foo}' you will see it works because both keys map to the same hashslot. |
@asheryerm yes, you're right! wasn't thinking about that during testing. our use case also is working on the same key per transaction. retested and looks good!!!! thank you! |
sorry for being MIA, for your issue, maybe you can use hashtag to make sure both keys are map to the same hashslot. So instead of
you can do:
|
@HenryYYang yes that also makes sense. our main use case is increment and expire for the same key within a transaction but this is helpful if we need to do multikey transactions! |
@garrett528 Hey Andrew, if this issue is resolved in your opinion, can you please close the issue? Thanks. |
If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
envoy-security@googlegroups.com where the issue will be triaged appropriately.
Title: MULTI EXEC causes crash during mirroring mode
Description:
I've found 2 issues, one of which will cause a segfault and crash Envoy, related to using MULTI EXEC in the redisproxy when using a mirror policy. When using MULTI EXEC in mirrored mode with reads mirrored, Envoy will crash. When using MULTI EXEC with reads not mirrored, the transaction will not be executed on the mirrored node(s).
Repro steps:
The way I reproduced this issue was to set up Envoy in a docker-compose along with a redis standalone node and a redis cluster. The redisproxy was configured with the standalone node as the primary with a request_mirror_policy to the redis cluster. I tested this with both exclude_read_commands:true and exclude_read_commands:false. The behavior of the Envoy proxy differed in both cases causing a crash when the reads were mirrored and simply not executing on the cluster when reads were not mirrored.
For testing this, I sent a couple of redis commands to the proxy:
SET baz 100
MULTI
SET baz 200
EXEC
Config:
request_mirror_policy set
Logs:
For exclude_read_commands:true, the debug logs looked like:
[36mproxy_1 | [0m [2023-03-24 15:13:30.577][17][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["set", "baz", "100"]'
[36mproxy_1 | [0m [2023-03-24 15:13:30.578][17][debug][connection] [source/common/network/connection_impl.cc:939] [C8] connecting to 172.20.0.20:6379
[36mproxy_1 | [0m [2023-03-24 15:13:30.578][17][debug][connection] [source/common/network/connection_impl.cc:958] [C8] connection in progress
[36mproxy_1 | [0m [2023-03-24 15:13:30.578][17][debug][connection] [source/common/network/connection_impl.cc:939] [C9] connecting to 172.20.0.31:6373
[36mproxy_1 | [0m [2023-03-24 15:13:30.579][17][debug][connection] [source/common/network/connection_impl.cc:958] [C9] connection in progress
[36mproxy_1 | [0m [2023-03-24 15:13:30.580][17][debug][connection] [source/common/network/connection_impl.cc:688] [C8] connected
[36mproxy_1 | [0m [2023-03-24 15:13:30.580][17][debug][connection] [source/common/network/connection_impl.cc:688] [C9] connected
[36mproxy_1 | [0m [2023-03-24 15:13:33.025][17][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["multi"]'
[36mproxy_1 | [0m [2023-03-24 15:13:36.953][17][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["set", "baz", "200"]'
[36mproxy_1 | [0m [2023-03-24 15:13:36.953][17][debug][connection] [source/common/network/connection_impl.cc:939] [C10] connecting to 172.20.0.20:6379
[36mproxy_1 | [0m [2023-03-24 15:13:36.953][17][debug][connection] [source/common/network/connection_impl.cc:958] [C10] connection in progress
[36mproxy_1 | [0m [2023-03-24 15:13:36.954][17][debug][connection] [source/common/network/connection_impl.cc:688] [C10] connected
[36mproxy_1 | [0m [2023-03-24 15:13:38.626][17][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["exec"]'
[36mproxy_1 | [0m [2023-03-24 15:13:38.629][17][debug][connection] [source/common/network/connection_impl.cc:139] [C10] closing data_to_write=0 type=1
[36mproxy_1 | [0m [2023-03-24 15:13:38.630][17][debug][connection] [source/common/network/connection_impl.cc:250] [C10] closing socket: 1
[36mproxy_1 | [0m [2023-03-24 15:13:41.876][17][debug][connection] [source/common/network/connection_impl.cc:656] [C6] remote close
[36mproxy_1 | [0m [2023-03-24 15:13:41.876][17][debug][connection] [source/common/network/connection_impl.cc:250] [C6] closing socket: 0
For this case, it appears that the proxy did not try to send the MULTI EXEC to the cluster (172.20.0.31:6373) as it did for the simple SET command. From a quick look at the source code in https://github.com/envoyproxy/envoy/blob/main/source/extensions/filters/network/common/redis/supported_commands.h#L86-L95, I don't think that MULTI is being treated as a write command and is excluded from the mirroring.
For exclude_read_commands:false, the debug logs looked like:
[36mproxy_1 | [0m [2023-03-24 01:26:22.269][16][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["set", "baz", "100"]'
[36mproxy_1 | [0m [2023-03-24 01:26:22.270][16][debug][connection] [source/common/network/connection_impl.cc:939] [C6] connecting to 172.20.0.20:6379
[36mproxy_1 | [0m [2023-03-24 01:26:22.272][16][debug][connection] [source/common/network/connection_impl.cc:958] [C6] connection in progress
[36mproxy_1 | [0m [2023-03-24 01:26:22.274][16][debug][connection] [source/common/network/connection_impl.cc:939] [C7] connecting to 172.20.0.31:6373
[36mproxy_1 | [0m [2023-03-24 01:26:22.274][16][debug][connection] [source/common/network/connection_impl.cc:958] [C7] connection in progress
[36mproxy_1 | [0m [2023-03-24 01:26:22.274][16][debug][connection] [source/common/network/connection_impl.cc:688] [C6] connected
[36mproxy_1 | [0m [2023-03-24 01:26:22.275][16][debug][connection] [source/common/network/connection_impl.cc:688] [C7] connected
[36mproxy_1 | [0m [2023-03-24 01:26:32.267][1][debug][connection] [source/common/network/connection_impl.cc:939] [C8] connecting to 172.20.0.32:6374
[36mproxy_1 | [0m [2023-03-24 01:26:32.268][1][debug][connection] [source/common/network/connection_impl.cc:958] [C8] connection in progress
[36mproxy_1 | [0m [2023-03-24 01:26:32.268][1][debug][connection] [source/common/network/connection_impl.cc:688] [C8] connected
[36mproxy_1 | [0m [2023-03-24 01:26:51.155][16][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["multi"]'
[36mproxy_1 | [0m [2023-03-24 01:26:56.086][16][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["set", "baz", "200"]'
[36mproxy_1 | [0m [2023-03-24 01:26:56.087][16][debug][connection] [source/common/network/connection_impl.cc:939] [C9] connecting to 172.20.0.20:6379
[36mproxy_1 | [0m [2023-03-24 01:26:56.088][16][debug][connection] [source/common/network/connection_impl.cc:958] [C9] connection in progress
[36mproxy_1 | [0m [2023-03-24 01:26:56.089][16][debug][connection] [source/common/network/connection_impl.cc:688] [C9] connected
[36mproxy_1 | [0m [2023-03-24 01:26:57.283][1][debug][connection] [source/common/network/connection_impl.cc:939] [C10] connecting to 172.20.0.33:6375
[36mproxy_1 | [0m [2023-03-24 01:26:57.284][1][debug][connection] [source/common/network/connection_impl.cc:958] [C10] connection in progress
[36mproxy_1 | [0m [2023-03-24 01:26:57.289][1][debug][connection] [source/common/network/connection_impl.cc:688] [C10] connected
[36mproxy_1 | [0m [2023-03-24 01:26:57.890][16][debug][redis] [source/extensions/filters/network/redis_proxy/command_splitter_impl.cc:663] splitting '["exec"]'
[36mproxy_1 | [0m [2023-03-24 01:26:57.894][16][debug][connection] [source/common/network/connection_impl.cc:139] [C9] closing data_to_write=0 type=1
[36mproxy_1 | [0m [2023-03-24 01:26:57.894][16][debug][connection] [source/common/network/connection_impl.cc:250] [C9] closing socket: 1
[36mproxy_1 | [0m [2023-03-24 01:26:57.894][16][critical][assert] [source/common/stats/symbol_table.cc:487] assert failure: search != decode_map_.end(). Details: no such symbol
[36mproxy_1 | [0m [2023-03-24 01:26:57.895][16][critical][backtrace] [./source/server/backtrace.h:104] Caught Aborted, suspect faulting address 0x6500000001
[36mproxy_1 | [0m [2023-03-24 01:26:57.895][16][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[36mproxy_1 | [0m [2023-03-24 01:26:57.895][16][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 9184b84/1.25.0/Clean/RELEASE/BoringSSL
[36mproxy_1 | [0m [2023-03-24 01:26:57.895][16][critical][backtrace] [./source/server/backtrace.h:96] #0: kernel_rt_sigreturn [0xffff99940870]
[36mproxy_1 | [0m [2023-03-24 01:26:57.896][16][critical][backtrace] [./source/server/backtrace.h:96] #1: abort [0xffff996b5aac]
[36mproxy_1 | [0m [2023-03-24 01:26:57.896][16][critical][backtrace] [./source/server/backtrace.h:98] #2: [0xaaaac109eb5c]
[36mproxy_1 | [0m [2023-03-24 01:26:57.896][16][critical][backtrace] [./source/server/backtrace.h:98] #3: [0xaaaac10a0734]
[36mproxy_1 | [0m [2023-03-24 01:26:57.896][16][critical][backtrace] [./source/server/backtrace.h:98] #4: [0xaaaac109cc10]
[36mproxy_1 | [0m [2023-03-24 01:26:57.896][16][critical][backtrace] [./source/server/backtrace.h:98] #5: [0xaaaac109cf80]
[36mproxy_1 | [0m [2023-03-24 01:26:57.897][16][critical][backtrace] [./source/server/backtrace.h:98] #6: [0xaaaac109d9e8]
[36mproxy_1 | [0m [2023-03-24 01:26:57.897][16][critical][backtrace] [./source/server/backtrace.h:98] #7: [0xaaaac079566c]
[36mproxy_1 | [0m [2023-03-24 01:26:57.897][16][critical][backtrace] [./source/server/backtrace.h:98] #8: [0xaaaac0793c18]
[36mproxy_1 | [0m [2023-03-24 01:26:57.898][16][critical][backtrace] [./source/server/backtrace.h:98] #9: [0xaaaac0793790]
[36mproxy_1 | [0m [2023-03-24 01:26:57.898][16][critical][backtrace] [./source/server/backtrace.h:98] #10: [0xaaaac109bf14]
[36mproxy_1 | [0m [2023-03-24 01:26:57.904][16][critical][backtrace] [./source/server/backtrace.h:98] #11: [0xaaaabf7f20e4]
[36mproxy_1 | [0m [2023-03-24 01:26:57.905][16][critical][backtrace] [./source/server/backtrace.h:98] #12: [0xaaaabf7e970c]
[36mproxy_1 | [0m [2023-03-24 01:26:57.905][16][critical][backtrace] [./source/server/backtrace.h:98] #13: [0xaaaabf7f012c]
[36mproxy_1 | [0m [2023-03-24 01:26:57.906][16][critical][backtrace] [./source/server/backtrace.h:98] #14: [0xaaaabf7ef414]
[36mproxy_1 | [0m [2023-03-24 01:26:57.906][16][critical][backtrace] [./source/server/backtrace.h:98] #15: [0xaaaabf7e9288]
[36mproxy_1 | [0m [2023-03-24 01:26:57.906][16][critical][backtrace] [./source/server/backtrace.h:98] #16: [0xaaaabf7ea244]
[36mproxy_1 | [0m [2023-03-24 01:26:57.907][16][critical][backtrace] [./source/server/backtrace.h:98] #17: [0xaaaac0e64bc4]
[36mproxy_1 | [0m [2023-03-24 01:26:57.907][16][critical][backtrace] [./source/server/backtrace.h:98] #18: [0xaaaac0dfe928]
[36mproxy_1 | [0m [2023-03-24 01:26:57.907][16][critical][backtrace] [./source/server/backtrace.h:98] #19: [0xaaaac0dfca24]
[36mproxy_1 | [0m [2023-03-24 01:26:57.907][16][critical][backtrace] [./source/server/backtrace.h:98] #20: [0xaaaac0df31f0]
[36mproxy_1 | [0m [2023-03-24 01:26:57.907][16][critical][backtrace] [./source/server/backtrace.h:98] #21: [0xaaaac0df4320]
[36mproxy_1 | [0m [2023-03-24 01:26:57.908][16][critical][backtrace] [./source/server/backtrace.h:98] #22: [0xaaaac1123d54]
[36mproxy_1 | [0m [2023-03-24 01:26:57.908][16][critical][backtrace] [./source/server/backtrace.h:98] #23: [0xaaaac11225f4]
[36mproxy_1 | [0m [2023-03-24 01:26:57.908][16][critical][backtrace] [./source/server/backtrace.h:98] #24: [0xaaaac077e1d4]
[36mproxy_1 | [0m [2023-03-24 01:26:57.908][16][critical][backtrace] [./source/server/backtrace.h:98] #25: [0xaaaac1129e78]
[36mproxy_1 | [0m [2023-03-24 01:26:57.909][16][critical][backtrace] [./source/server/backtrace.h:96] #26: start_thread [0xffff9980f624]
[36mproxy_1 | [0m ConnectionImpl 0x14843f3dcd80, connecting: 0, bind_error: 0, state(): Closed, read_buffer_limit_: 1048576
[36mproxy_1 | [0m socket_:
[36mproxy_1 | [0m ListenSocketImpl 0x14843f388400, transport_protocol_:
[36mproxy_1 | [0m connection_info_provider_:
[36mproxy_1 | [0m ConnectionInfoSetterImpl 0x14843f36fd58, remote_address_: 172.20.0.20:6379, direct_remote_address_: 172.20.0.20:6379, local_address_: 172.20.0.10:49342, server_name_:
[36menvoy_proxy_1 exited with code 133
[0m
I was not able to debug this very well as I could not get Envoy to build on my machine. I also suspect that if the MULTI EXEC is treated as a write command and is sent in the 1st scenario, we would see this error there as well.
Call Stack:
The text was updated successfully, but these errors were encountered: