Replies: 2 comments 6 replies
-
True streaming parsers are an open problem in parsing theory because reads against streams/pipes/sockets/etc are stateful. |
Beta Was this translation helpful? Give feedback.
-
While useful, general support of "stream" support in PEGs is not feasible as mentioned - PEGs as a general construct support infinite backtracking. That said, there are of course several "easy" to implement workarounds for implementing a protocol with pegs that involve defining a message delimiter pattern.
As @CosmicToast has pointed out, there are fundamental issues with the method of abstraction here. Also using the event loop inside the peg interpreter is not code that I want to write or deal with. Hardly "simple". I think a function like |
Beta Was this translation helpful? Give feedback.
-
Hi! I've been writing a Janet library to simulate mouse and keyboard events to learn the language. During development I've found some things that could (in my opinion) be improved in Janet, and I thought it's a good idea to write them down so I can submit some issues/PRs to discuss with the community. This is one of them.
One of the things Janet excels at is shell scripting and everything related.
Yet there's a common problem I haven't found an existing solution for - processing data from long-running processes, aka processing Janet streams.
The problem
How to read and process a stream line-by-line?
This is needed when processing output from os/spawn, parsing HTTP requests with Keep-Alive, implementing simple getline for streams (e.g. netrepl), etc.
This is not necessarily about lines - any protocol / data format which doesn't specify packet length in advance will have similar problems when being read from a stream - JSON, HTTP, netstring, HTML/XML, etc.
What doesn't work:
(:read s :all)
and then(string/split)
isn't usually possible - e.g. HTTP stream with Keep-Alive, reading REPL input, reading from long-lived processes (tail -f logs
,tcpdump
).How it is done right now:
spork/http/http-header (link)
Very verbose and imperative - needs a buffer that outlives the function, a last-index variable, a forever loop, a ret variable.
How others do it:
Janet parser - parser/consume and parser/produce
This works, but too verbose for such a simple (and common) thing.
spork/netrepl
Uses messages from spork/msg with prefixed length to avoid the problem altogether. Very nice, but not always possible.
spork/getline
Reads data byte-by-byte. Works for a REPL, but too slow for parsing lots of data.
Clojure
Proposed solution - stream support in PEGs
Using PEGs would be awesome:
For example, the spork/http/read-header function would shrink from 24 lines to
(peg/match peg conn)
. And right now the function uses peg/match anyway, but it's surrounded with lots of code to convertconn
into a large enoughbuf
.Reading a line would be
(to "\n")
.PEGs already have a concept of consuming characters so a PEG could just work on a buffer like usual, but
(:read stream 64 buf)
if the buffer ends when the PEG wants more characters.Buffering
The biggest problem with all this is that a PEG could :read more bytes from a stream then it had to, and those bytes would be lost when the stream is used afterwards.
This could use some form of "unread" functionality, but I'm not sure what's the best way to implement it. Maybe a stream could hold an internal buffer to allow unreading up to the last read's size? That seems to be enough for almost all use cases.
Another way is to require the user to pass an external buffer that will be used for this. This is simpler to implement, but less elegant and requires a different API -
(peg/match peg stream)
would not be possible anymore.Internal buffer (unread)
Pros:
(peg/match peg stream)
Cons:
External buffer
Pros:
Cons:
I suggest using an external buffer and function signatures similar to this:
(peg/match peg buf &opt stream-to-get-extra-characters)
Beta Was this translation helpful? Give feedback.
All reactions