Skip to content
This repository was archived by the owner on Oct 7, 2020. It is now read-only.

Consume invalid input #43

Closed
cocreature opened this issue Nov 8, 2015 · 24 comments
Closed

Consume invalid input #43

cocreature opened this issue Nov 8, 2015 · 24 comments
Milestone

Comments

@cocreature
Copy link
Collaborator

pipes-parse doesn't consume the input if it doesn't parse which results in us repeatedly trying to parse the same input which results in an perpetual flood of error messages.

My pipes foo is too weak to figure out how to fix this right now.

@alanz
Copy link
Collaborator

alanz commented Nov 8, 2015

I saw it going into that mode when testing by hand. Also not something I don't know much about.

BTW, pipes-parse is not a requirement, conduit or some such could be used too, if anyone knows how

@cocreature
Copy link
Collaborator Author

So I have a fix, but it requires patching pipes-attoparsec and pipes-aeson or to basically reimplement what they're doing in haskell-ide-engine. pipes-attoparsec explicitely puts everything it has read back on the pipe when it encounters an error so we need an additional function that does not do that and then we need to add a function to pipes-aeson that uses that.

I have a poc of that locally, so it does actually work, but I'm not sure if it's worth the trouble. Also I'm not sure if we'll manage to get those upstream, pinging @k0001 for that.

I guess it's best to leave it for now and see if we can come up with an easier solution or if @k0001 is willing to accept the required changes. Otherwise I would just go for handrolling it without any streaming library, since I don't know anything about conduit either :)

@alanz
Copy link
Collaborator

alanz commented Nov 8, 2015

Whatever works.

One thing I considered is to use line-delimited JSON instead of concatenated.

That way it becomes a case of reading a line at a time from the stream, then parsing/processing it. If the parse fails, the line can be discarded anyway.

@cocreature
Copy link
Collaborator Author

I thought about that too, I thought that this might cause problems when we encode/decode some json string that contains newlines, but it looks like they are escaped so we that should be fine.

@cocreature
Copy link
Collaborator Author

I checked, json requires them to be escaped so this should be safe. Seems like the best solution for now, I'll have a go at it somewhat soon™.

@alanz
Copy link
Collaborator

alanz commented Nov 8, 2015

Great

@gracjan
Copy link
Contributor

gracjan commented Nov 8, 2015

Long-term I'd love to use concatenated, it is great to have a user-friendly protocol that accepts pretty-indented json as input.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

I agree, but I think some sort of message framing separate from the actual message it a good idea.

I was actually wondering whether we to not use one of STX / ETX / EOT or similar as a frame delimiter, they will be escaped in JSON anyway.

@cocreature
Copy link
Collaborator Author

Since you both think concatenated is nicer and I agree, I made a proposal in #49 of how it looks if we just pull in the necessary code from pipes-aeson and pipes-attoparsec inside haskell-ide-engine. We can then still try to get it upstream and remove it once it’s there, but at least it works for now. Also it's not that much code, so it might not be such a bad solution.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

It seems to bring in a lot of complexity for something that should be simple.

I still side with a separation between framing and parsing.

@cocreature
Copy link
Collaborator Author

The problem I have with using one of STX, ETX or EOT is that I don't even know how to type that. Ofc I could look it up, but it makes it harder for people to use. On the other hand adding a bit of complexity makes it pretty obvious for users and it's encapsulated so I don't think the complexity is that bad here.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

What happens if we get malformed JSON, with unbalanced braces?

@cocreature
Copy link
Collaborator Author

Attoparsec supports partial parses, so if you input a single { it just waits for more input until it can either no longer become valid json or it becomes a valid json in which case we process that and then wait for the next one.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

So if I send

{"blah":"foo

and then do the rest of my processing with valid JSON will will wait
forever or until RAM runs out sucking on the stream?

On Mon, Nov 9, 2015 at 10:51 AM, Moritz Kiefer notifications@github.com
wrote:

Attoparsec supports partial parses, so if you input a single { it just
waits for more input until it can either no longer become valid json or it
becomes a valid json in which case we process that and then wait for the
next one.


Reply to this email directly or view it on GitHub
#43 (comment)
.

@cocreature
Copy link
Collaborator Author

Yep, I hadn't thought about that. Ok, I'll drop the idea of concatenated input, we really want some terminator.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

Thats the conclusion I had come to.

I suggest STX.

On Mon, Nov 9, 2015 at 11:08 AM, Moritz Kiefer notifications@github.com
wrote:

Yep, I hadn't thought about that. Ok, I'll drop the idea of concatenated
input, we really want some terminator.


Reply to this email directly or view it on GitHub
#43 (comment)
.

@k0001
Copy link

k0001 commented Nov 9, 2015

It is not a good idea to have pipes-attoparsec or pipes-aeson discard that “bad input” because both attoparsec and pipes-parse expect these Parsers to backtrack by default, and it would be surprising for users if these functions did something else.

Nevertheless, that shouldn't be a problem. Once you have delimiters in your stream (say, newline markers) you can try different things: For example, by using pipes-bytestring you can break your input stream into individual lines to be parsed and attempt to parse each of them individually discarding those that are wrong.

Another alternative that works without pipes-bytestring is to wrap your Attoparsec parser foo with something like:

bar = (fmap Maybe foo <|> (Nothing <$ takeWhile (/= '\n)) <* char '\n'

That is, bar either succeeds in parsing the entire line with foo, in which case it returns Just, or it succeeds in just throwing away the rest of the line, in which case it returns Nothing. Notice that even if we understand the second case as a “failure” conceptually, we encoded it as a successful scenario so that the parser never fails as long as the input finishes with a newline.

@cocreature
Copy link
Collaborator Author

I wasn't trying to suggest that you replace the existing functions but rather add additional functions with the discard behavior. Anyway, since we decided to go for a separator approach this is not necessary.

I tried figuring out how to to do it using pipes-bytestring and pipes-group earlier today, but was too stupid to do so, so I'll leave this to someone else.

@alanz
Copy link
Collaborator

alanz commented Nov 9, 2015

@k0001 are you prepared to give it a go?

@k0001
Copy link

k0001 commented Nov 9, 2015

@alanz I don't follow the development of haskell-ide-engine myself so I'm not sure what issue is being discussed here. But sure, I can help.

@cocreature what do you think about working on this together? That way I can learn a bit more about about haskell-ide-engine and you can learn a bit more about pipes.

@cocreature
Copy link
Collaborator Author

@k0001 So I was trying to nicely write up where I failed but during that I think I found a solution :)

I used break from pipes-bytestring. It would be great if you could take a quick look and tell me if it makes somewhat sense or even better tell me what I should improve :) It's here In particular I am not sure if this is the right way to get the rest of the producer after splitting it.

@alanz alanz added this to the P1: Must milestone Nov 10, 2015
@gracjan
Copy link
Contributor

gracjan commented Nov 11, 2015

I'm against trying to fix up a broken stream. Desync problems should be fatal.

Rationale: I see two cases when invalid input on json-concatenated channel may happen:

  1. Interleaving of threads giving {{"k1""k2"::"v1""v2"}} or some such.
  2. Invalid output, like hie: error xxx{"k1":"v1"}.

Note that using STX or similar does not help on higher level as it causes at least one innocent message to be garbled and lost. Emacs treats this channel as reliable and I do not foresee any mechanisms to compensate for lost messages. I do not think any other integration will implement such a thing either. That is why I would prefer to just die when such an error happens.

@alanz
Copy link
Collaborator

alanz commented Nov 11, 2015

yes, but we need to be able to detect it.

If we just use concatenated JSON and get a malformed message (missing
closing '}'), we will potentially hang until memory is exhausted waiting
for the closing '}'

If we have a delimeter, the transport can report a message back indicating
malformed input, which can help the IDE integrator to sort out the problem.

Just dying will lose the debug opportunity.

On Wed, Nov 11, 2015 at 9:52 AM, Gracjan Polak notifications@github.com
wrote:

I'm against trying to fix up a broken stream. Desync problems should be
fatal.

Rationale: I see two cases when invalid input on json-concatenated
channel may happen:

  1. Interleaving of threads giving {{"k1""k2"::"v1""v2"}} or some such.
  2. Invalid output, like hie: error xxx{"k1":"v1"}.

Note that using STX or similar does not help on higher level as it causes
at least one innocent message to be garbled and lost. Emacs treats this
channel as reliable and I do not foresee any mechanisms to compensate
for lost messages. I do not think any other integration will implement such
a thing either. That is why I would prefer to just die when such an error
happens.


Reply to this email directly or view it on GitHub
#43 (comment)
.

@gracjan
Copy link
Contributor

gracjan commented Nov 11, 2015 via email

cocreature added a commit to cocreature/haskell-ide-engine that referenced this issue Nov 11, 2015
cocreature added a commit to cocreature/haskell-ide-engine that referenced this issue Nov 12, 2015
cocreature added a commit to cocreature/haskell-ide-engine that referenced this issue Nov 13, 2015
cocreature added a commit to cocreature/haskell-ide-engine that referenced this issue Nov 13, 2015
cocreature added a commit to cocreature/haskell-ide-engine that referenced this issue Nov 13, 2015
cocreature added a commit that referenced this issue Nov 13, 2015
Expect json input separated by STX, fix #43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants