-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiKV panics when the size of response exceeds 4GB #9012
Comments
It's by design that grpc can't send message larger than 4GB. Though it should report error instead of core dump. To solve this, maybe we should change the rpc to server streaming call or add paging logic. |
We can avoid this panic from the TiKV side, but it's better gRPC can report errors. |
I'm afraid this is not possible ATM for latest gRPC. The binary will either core dump or panic inside rust-protobuf. See also stepancheg/rust-protobuf#530. |
Lower its severity since it rarely occurs. |
Encounter it again https://asktug.com/t/topic/69426 |
Now that we have been using the forked version of protobuf, we can also add such protection inside the forked. |
gRPC can't handle messages larger than 4GiB. This PR solves the issue by checking response's binary length during serializing. Before the change, TiKV will either coredump in grpc or panic in protobuf, after the change, it will print a log in the TiKV side and call will be cancel in the client side. Close tikv#9012 Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
* change protobuf and update logs Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * server: tolerate large response gRPC can't handle messages larger than 4GiB. This PR solves the issue by checking response's binary length during serializing. Before the change, TiKV will either coredump in grpc or panic in protobuf, after the change, it will print a log in the TiKV side and call will be cancel in the client side. Close #9012 Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * fix format Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
After the fix, TiKV will log the failure instead of panicking. |
* change protobuf and update logs Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * server: tolerate large response gRPC can't handle messages larger than 4GiB. This PR solves the issue by checking response's binary length during serializing. Before the change, TiKV will either coredump in grpc or panic in protobuf, after the change, it will print a log in the TiKV side and call will be cancel in the client side. Close tikv#9012 Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * fix format Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Encounter it again, do you have any idea when the response size would be so large without a large region. @youjiali1995 @BusyJay |
You mean panic? What version did you use? |
* change protobuf and update logs Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * server: tolerate large response gRPC can't handle messages larger than 4GiB. This PR solves the issue by checking response's binary length during serializing. Before the change, TiKV will either coredump in grpc or panic in protobuf, after the change, it will print a log in the TiKV side and call will be cancel in the client side. Close tikv#9012 Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * fix format Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
@BusyJay It doesn't panic, but the query can't succeed. After investigation, it's confirmed as the chunk codec(arrow codec) would reserve space for the fixed-length field even if the value is null. So when there are lots of fixed-length fields filling with null, the size of copr resp would be amplified by multiple times and exceed 4GB. |
/cc @coocood |
Bug Report
What version of TiKV are you using?
3.x, 4.x, master
What operating system and CPU are you using?
not related
Steps to reproduce
Deploy the TiKV in #9010 which builds a 6GB result of mvcc_key and send mvcc_key request to it.
What did you expect?
The request is canceled or finished but TiKV is alive.
What did happened?
TiKV panics.
The text was updated successfully, but these errors were encountered: