Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: feat: Resilient parsing #60

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

414owen
Copy link

@414owen 414owen commented Mar 19, 2024

This is an attempt to make pinch resilient against bad packets.

To do this, I made parsing messages incremental, meaning that the parse
can fail before all the packets of the message have been read into
memory. This depends on a PR to cereal, which hasn't been merged yet.

I've made decoding the method name incremental. It will now short-circuit the parsing of a message, if the name isn't valid utf-8.

I've added the option to only accept method names that can be produced by the thrift IDL compiler. This is a much more restrictive requirement than we're given by the binary protocol spec, but what you have to remember is that the spec is largely reverse-engineered from implementations anyway:

From the spec:

The information here is based on research in the Java implementation in the Apache thrift library (version 0.9.1 and 0.9.3) and THRIFT-110 A more compact format. Other implementation however, should behave the same.

I've added more comprehensive guards to the sizes of binary, map, set, and list sizes.

@414owen 414owen force-pushed the os/integrate-isolateLazy branch 4 times, most recently from 929a426 to a5d1ba8 Compare March 20, 2024 09:18
src/Pinch/Protocol/Internal.hs Outdated Show resolved Hide resolved

data Utf8Result
= Utf8Success !TE.StrictBuilder
| Utf8Failure !TE.UnicodeException
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could it be useful for Utf8Failure to hold information about what has been successfully parsed so far?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, Utf8Result isn't exported. It, and parseUtf8Incremental', are just used to implement parseUtf8Incremental.

I don't currently see a use for the parsed prefix in parseUtf8Incremental, but maybe you've seen one?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the details well enough. Was just a thought that maybe including what was successfully parsed so far could be useful to help identify the method name.

@abhinav
Copy link
Owner

abhinav commented Jun 28, 2024

Thanks for the PR, @414owen. I'm supportive of this change (although I haven't taken a review pass over it yet).
If the cereal change gets merged and released, I'll be happy to review and merge this.

Owen Shepherd and others added 4 commits August 28, 2024 10:25
This is an attempt to make pinch resilient against bad packets.

To do this, I made parsing messages incremental, meaning that the parse
can fail before all the packets of the message have been read into
memory.

I've made decoding the method name incremental, and added the option
to decode via the spec verbatim, or to only accept method names that
can be produced by the thrift IDL compiler.
This adds protocol limits as optional parameters to the binary and
compact protocol smart constructors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants