-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow protobuf deserialization and serialization for schema registry #1146
Comments
Did you try this fix? #1128 |
Closing this for now as the main bottleneck has been addressed |
Thanks @fzmoment , sounds good a PR is welcome |
Hi @rayokota , I added an implementation to #1151 if you could take a look. Unfortunately, there was one hiccup to the approach i mentioned which is that if we are caching by file descriptor, and a dependency changes, the caching won't reflect that change, so I put this behavior behind a feature flag. Included more details in the PR, PTAL! |
Description
When using protobuf schemas we've noticed that the drop in performance is much worse than expected. For example, compared to the baseline of just a regular protobuf Unmarshal call, which takes on the order of <10 us:
Using the confluent kafka go library, we see a over 2500x increase in processing time:
Furthermore, we know that this decrease is specifically not due to communication with the schema registry. In this example, we are deserializing the same schema over and over again and thus only make a request to the registry once before the result is cached - in general, the schema ID lookup takes ~2us:
The bottleneck is the
ToFileDesc
function which is parsing the schema string back to a file descriptor.this can take up on the order of seconds:
Is this a known issue/are there any workarounds? We see similar behavior on the write side (where the bottleneck is the conversion from the file descriptor to a string, not looking up the schema ID for a schema given the caching), though not quite as drastic as the above examples. Any pointers would be appreciated.
How to reproduce
Run a consumer that uses the confluent kafka go library to deserialize all the messages on a given topic. In our setup, the topic schema only contains one message. Can provide a minimal repro example if this gains some traction.
Checklist
Please provide the following information:
LibraryVersion()
):ConfigMap{...}
"debug": ".."
as necessary)The text was updated successfully, but these errors were encountered: