Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field for loaders to indicate what form of JSON parsing is used #13

Open
weegeekps opened this issue Nov 11, 2019 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@weegeekps
Copy link
Contributor

This discussion started in KhronosGroup/glTF#1699.

@lexaknyazev had a great idea to add a field to Project Explorer specifically for loaders, to indicate what form of JSON parsing they use for the manifest. Most JSON parsers use DOM parsing, but this begins to falter when you exceed a certain threshold (generally based on the computer's available memory) and streaming, SAX-style, JSON parsers often fill this space. Jackson is a good example of a streaming JSON parser that I am immediately aware of.

I've proposed two options so far:

  • A boolean flag indicating the loader supports parsing large manifests (threshold around ≥100MiB? higher?)
  • A field describing how the JSON parsing works? Is it using a DOM parser or a streaming parser?

And @lexaknyazev voiced interest in the more descriptive field that indicates explicitly using a DOM parser or a SAX/Streaming parser.

I'm currently more a fan of the field as well, but I would urge us to go with "streaming" over SAX as while many of the streaming JSON parsers are SAX-style, some very popular parsers are not. JSONStream comes to mind.

@weegeekps weegeekps added the enhancement New feature or request label Nov 11, 2019
@javagl
Copy link
Contributor

javagl commented Nov 12, 2019

I don't have a very strong opinion here, but some doubts (I always have doubts): It is very specific, might be hard to find out for one loader or the other, and in the end, might not tell you anything about the actual capabilities of the loader.

That DOM imposes a hard limit is true, but I have no idea where this limit will be (i.e. what a "large" file is: 100MB or 1GB? And what should be contained in a 1GB file that is then supposed to be rendered?). But conversely, SAX does not mean that you can in fact load large files - what information should be presented if a loader uses SAX, but bails out with 10MB files?

(I'm not opposed to the idea, just wondering what the exact information is that is to be represented there, and how it should be gathered...)

@weegeekps
Copy link
Contributor Author

I've mulled over this for the past day or so and I agree, I think that this is extremely specific and would be difficult to discover for every loader. In an ideal world, the person adding the loader to the file would be the individual behind the project, and would have the context to provide the needed information. Reality tends to act against the ideal in these sorts of cases.

I've spent a fair bit of time researching parser limits this evening and I've found that they tend to be reasonably documented in some languages and environments (C#, Java) and nearly impossible to find for the browsers and other languages. I think this is due to the fact that the IETF spec for JSON explicitly calls out that parsers can set limits on the size, depth, and nesting of the files it parses, but this action is completely optional. It makes some sense to me that from a browser perspective that you wouldn't set a hard limit and instead just try it and see; if you run out of memory you fail anyways.

I'm really on the fence about this now. On one hand I feel there isn't much harm to be had in adding a field to be filled with the information when the submitter is comfortable. On the other hand this opens the door to the potential of many little cases that may never be used, or may be misleading. I do think there is a problem to solve here, but I'm less confident on the initial idea of a field indicating DOM vs Streaming/SAX. Perhaps Memory-Bound vs a Hard Limit would be more useful?

@lexaknyazev
Copy link
Member

In an ideal world, the person adding the loader to the file would be the individual behind the project, and would have the context to provide the needed information. Reality tends to act against the ideal in these sorts of cases.

Why so? Usually, the authors of glTF libraries open PRs with information about their projects. We can ask for specifics before merging.

nearly impossible to find for the browsers

Browsers' JSON implementations are open-source (see V8 for Chrome, SpiderMonkey for Firefox, JavaScriptCore for Safari, and Chakra for EdgeHTML). Following ECMAScript API, they provide only blocking interface (JSON.parse()) that returns single DOM object.

@weegeekps
Copy link
Contributor Author

Why so? Usually, the authors of glTF libraries open PRs with information about their projects. We can ask for specifics before merging.

I could be working on a wrong assumption, but a lot of the entries in the late-breaking projects list seem to be added by Patrick. This gives the impression that we end up adding a lot of these entries, I agree it'd be better if we have developers of these projects open a PR to add or update their project's entry themselves.

Browsers' JSON implementations are open-source (see V8 for Chrome, SpiderMonkey for Firefox, JavaScriptCore for Safari, and Chakra for EdgeHTML). Following ECMAScript API, they provide only blocking interface (JSON.parse()) that returns single DOM object.

It's been a while since I've dived into the V8 codebase, but IIRC the DOM object doesn't have any set maximum size; basically if it can fit it in memory it does. That said, the blocking aspect is a great point that I overlooked; if you've got 2 gigs of JSON data you're parsing using JSON.parse() it's still likely going to block the event loop for quite some time, even if the host machine has more than sufficient memory.

I'm now currently favoring a field that's just "JSON Parsing Strategy" or some similar text, with guidance to indicate if it's memory-bound or blocking, and providing DOM and Streaming/SAX as two examples. Ultimately this is a rather complex piece of data to catch, but text such as "Non-blocking, streaming I/O" is useful. The downside is that this would be impossible to filter on outside of a substring search, but I think having this as a non-filterable field is acceptable.

@lexaknyazev
Copy link
Member

if you've got 2 gigs of JSON data you're parsing using JSON.parse() it's still likely going to block the event loop for quite some time

FWIW, it's possible to parse JSON asynchronously with Fetch API:

fetch("url.gltf")
  .then(response => response.json())
  .then(data => {
    // consume parsed JSON object
  });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants