-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: protobuf: serde: exception bypass for canonical repeated nested #69556
fix: protobuf: serde: exception bypass for canonical repeated nested #69556
Conversation
if (suffix.empty()) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific change is not required per se, but it arguably brings all relevant code paths into the diff.
If it may help the review, we can get rid of that 🙏
This is an automated comment for commit 167196b with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
Hi! Thank you for the PR, I will take a look at the code. Can you please fix style check first? |
Hello @Avogar , I'll get on that as soon as possible ( a bit later today ) 👍 Thank you for the help 🙂 |
c56e08a
to
7d97727
Compare
7d97727
to
167196b
Compare
After fixing the style issues, I took the liberty of slightly adapting the added tests in order to properly work with the testing system ( format_schemas dir not automatically present , no formats in fasttest , use correct test DB ). If these changes are incorrect, please let me know. At this point, the tests added in this PR do appear to work, however I'm not sure why the remaining CI checks fail. |
I attempted the test In any case, the other test failures appear related to timeouts, so I'd be very interested in possibly relaunching the CI jobs that failed to check if they fail again 🙏 |
Alright ! Thanks for letting me know and for the exploration URL, I didn't know this was publicly available 🙂 Looking forward to hearing from you again once you've had an opportunity to review the code and/or |
Looks good, thank you! |
e9006c1
Disclaimer
system.errors
when processing a "canonical"/"valid" proto/schema to be incorrect behaviour. Please correct it if necessary.On Semantics
The current potobuf serde implementation will always internally generate
PROTOBUF_FIELD_NOT_REPEATED
when de/serializing such a canonical example :
WHEN the proto field we are mapping into is
repeated
,AND when the matching columns are nested and of type Array,
THEN we attempt to serialize via
ProtobufSerializerFlattenedNestedAsArrayOfNestedMessages
first,RATHER THAN first going through the try/catch, throwing PROTOBUF_FIELD_NOT_REPEATED and catching it.
In the case where I am missing some use case and/or possible situtation where this control flow would be faulty,
please do comment / let me know, ideally with example proto/sql such that it can be turned into a new test and the implementation adapted 🙏
On Performance
I discovered this behaviour when investigating the performance of a cluster a colleage was running benchmarks on, where he would be consuming complex protobuf messages with the Kafka engine. Since clickhouse currently does not appear to re-use the generated de/serializer impl ( according to #59482 ), I realized that this try/catch flow behaviour was tanking performance, with a huge amount of CPU time spent in
libunwind
. Incidentally, we were generating millions ofPROTOBUF_FIELD_NOT_REPEATED
in thesystem.errors
table.CC
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Bypass try/catch flow when de/serializing nested repeated protobuf to nested columns ( fixes #41971 )
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):