Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to parse json by parts #527

Closed
AlexeySagin opened this issue Feb 5, 2016 · 6 comments
Closed

Add ability to parse json by parts #527

AlexeySagin opened this issue Feb 5, 2016 · 6 comments
Labels

Comments

@AlexeySagin
Copy link

The parser enhancement is required to parse json partially.
The following functionality of the lirary would be very helpful: the ability to send to the parser json request divided by separated parts as it is recieved from server/client/etc.
At the moment it has to wait until all the json parts are recieved and only then the complete json can be passed to the parser. It would be great if the parts can be passed to the parser separately.

@AlexeySagin AlexeySagin changed the title Add ability to parse part of json Add ability to parse json by parts Feb 5, 2016
@miloyip
Copy link
Collaborator

miloyip commented Feb 6, 2016

I think this is similar to FAQ about Reader/Writer (SAX):

Can I pause the parsing process and resume it later?

This is not directly supported in the current version due to performance consideration. However, if the execution environment supports multi-threading, user can parse a JSON in a separate thread, and pause it by blocking in the input stream.

@spl
Copy link
Contributor

spl commented Feb 6, 2016

At one time, I wanted to be able to start parsing a string that contains a partial input (because the full input may not yet be available) and later continue parsing on more strings with remaining parts of the input. It sounds like the same thing that @AlexeySagin is requesting.

@miloyip I don't understand the question/answer in the FAQ and its connection to this issue. When I read “pause the parsing,” it sounds like an active choice I make to stop parsing, whereas I don't want to stop parsing but rather parsing should continue until it runs out of input, and I should be able to continue parsing with new input.

I believe this could be hacked together with RapidJSON as it is, but I never tried.

@miloyip
Copy link
Collaborator

miloyip commented Feb 6, 2016

To parse a JSON in multiple parts, it is like:

  1. parse(partA);
  2. do something else
  3. parse(partB);
  4. ...

Since currently the parser cannot resume its internal state (even iterative parser does not maintain internal states within single value), it will be quite impossible to change the code to support this.

The workaround in FAQ:

Thread A

  1. Lock buffer
  2. FIll the buffer
  3. Unlock buffer
  4. Send signal that a buffer is available for parsing.

Thread B

  1. Start parsing
  2. The stream wait until a signal
  3. The stream fetch the data from buffer
  4. When no data, wait for next signal.

Of course this can be done with double buffering so both threads can be run in parallel.
I am not sure if some coroutine solutions can help doing this concurrently. But these "hacks" are platform-specific anyway.

Note that some parsers like YAJL can do parsing part by part. The tradeoff is the unavoidable performance penalty.

@spl
Copy link
Contributor

spl commented Feb 6, 2016

@miloyip Thanks. That makes more sense.

@miloyip
Copy link
Collaborator

miloyip commented Feb 14, 2016

Maybe an example using pthread can demonstrate this.

miloyip added a commit that referenced this issue Feb 21, 2016
@miloyip
Copy link
Collaborator

miloyip commented Feb 21, 2016

I created an example using C++11 thread. But I am not familiar with it.
Please help reviewing #556 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants