Add field for loaders to indicate what form of JSON parsing is used #13

weegeekps · 2019-11-11T23:43:43Z

This discussion started in KhronosGroup/glTF#1699.

@lexaknyazev had a great idea to add a field to Project Explorer specifically for loaders, to indicate what form of JSON parsing they use for the manifest. Most JSON parsers use DOM parsing, but this begins to falter when you exceed a certain threshold (generally based on the computer's available memory) and streaming, SAX-style, JSON parsers often fill this space. Jackson is a good example of a streaming JSON parser that I am immediately aware of.

I've proposed two options so far:

A boolean flag indicating the loader supports parsing large manifests (threshold around ≥100MiB? higher?)
A field describing how the JSON parsing works? Is it using a DOM parser or a streaming parser?

And @lexaknyazev voiced interest in the more descriptive field that indicates explicitly using a DOM parser or a SAX/Streaming parser.

I'm currently more a fan of the field as well, but I would urge us to go with "streaming" over SAX as while many of the streaming JSON parsers are SAX-style, some very popular parsers are not. JSONStream comes to mind.

javagl · 2019-11-12T02:16:37Z

I don't have a very strong opinion here, but some doubts (I always have doubts): It is very specific, might be hard to find out for one loader or the other, and in the end, might not tell you anything about the actual capabilities of the loader.

That DOM imposes a hard limit is true, but I have no idea where this limit will be (i.e. what a "large" file is: 100MB or 1GB? And what should be contained in a 1GB file that is then supposed to be rendered?). But conversely, SAX does not mean that you can in fact load large files - what information should be presented if a loader uses SAX, but bails out with 10MB files?

(I'm not opposed to the idea, just wondering what the exact information is that is to be represented there, and how it should be gathered...)

weegeekps · 2019-11-15T04:47:11Z

I've mulled over this for the past day or so and I agree, I think that this is extremely specific and would be difficult to discover for every loader. In an ideal world, the person adding the loader to the file would be the individual behind the project, and would have the context to provide the needed information. Reality tends to act against the ideal in these sorts of cases.

I've spent a fair bit of time researching parser limits this evening and I've found that they tend to be reasonably documented in some languages and environments (C#, Java) and nearly impossible to find for the browsers and other languages. I think this is due to the fact that the IETF spec for JSON explicitly calls out that parsers can set limits on the size, depth, and nesting of the files it parses, but this action is completely optional. It makes some sense to me that from a browser perspective that you wouldn't set a hard limit and instead just try it and see; if you run out of memory you fail anyways.

I'm really on the fence about this now. On one hand I feel there isn't much harm to be had in adding a field to be filled with the information when the submitter is comfortable. On the other hand this opens the door to the potential of many little cases that may never be used, or may be misleading. I do think there is a problem to solve here, but I'm less confident on the initial idea of a field indicating DOM vs Streaming/SAX. Perhaps Memory-Bound vs a Hard Limit would be more useful?

lexaknyazev · 2019-11-16T10:47:36Z

In an ideal world, the person adding the loader to the file would be the individual behind the project, and would have the context to provide the needed information. Reality tends to act against the ideal in these sorts of cases.

Why so? Usually, the authors of glTF libraries open PRs with information about their projects. We can ask for specifics before merging.

nearly impossible to find for the browsers

Browsers' JSON implementations are open-source (see V8 for Chrome, SpiderMonkey for Firefox, JavaScriptCore for Safari, and Chakra for EdgeHTML). Following ECMAScript API, they provide only blocking interface (JSON.parse()) that returns single DOM object.

weegeekps · 2019-11-25T18:06:52Z

Why so? Usually, the authors of glTF libraries open PRs with information about their projects. We can ask for specifics before merging.

I could be working on a wrong assumption, but a lot of the entries in the late-breaking projects list seem to be added by Patrick. This gives the impression that we end up adding a lot of these entries, I agree it'd be better if we have developers of these projects open a PR to add or update their project's entry themselves.

Browsers' JSON implementations are open-source (see V8 for Chrome, SpiderMonkey for Firefox, JavaScriptCore for Safari, and Chakra for EdgeHTML). Following ECMAScript API, they provide only blocking interface (JSON.parse()) that returns single DOM object.

It's been a while since I've dived into the V8 codebase, but IIRC the DOM object doesn't have any set maximum size; basically if it can fit it in memory it does. That said, the blocking aspect is a great point that I overlooked; if you've got 2 gigs of JSON data you're parsing using JSON.parse() it's still likely going to block the event loop for quite some time, even if the host machine has more than sufficient memory.

I'm now currently favoring a field that's just "JSON Parsing Strategy" or some similar text, with guidance to indicate if it's memory-bound or blocking, and providing DOM and Streaming/SAX as two examples. Ultimately this is a rather complex piece of data to catch, but text such as "Non-blocking, streaming I/O" is useful. The downside is that this would be impossible to filter on outside of a substring search, but I think having this as a non-filterable field is acceptable.

lexaknyazev · 2019-12-18T16:42:43Z

if you've got 2 gigs of JSON data you're parsing using JSON.parse() it's still likely going to block the event loop for quite some time

FWIW, it's possible to parse JSON asynchronously with Fetch API:

fetch("url.gltf")
  .then(response => response.json())
  .then(data => {
    // consume parsed JSON object
  });

weegeekps added the enhancement New feature or request label Nov 11, 2019

javagl mentioned this issue May 5, 2021

Tracking the KTX 2.0 ecosystem #85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add field for loaders to indicate what form of JSON parsing is used #13

Add field for loaders to indicate what form of JSON parsing is used #13

weegeekps commented Nov 11, 2019

javagl commented Nov 12, 2019

weegeekps commented Nov 15, 2019

lexaknyazev commented Nov 16, 2019

weegeekps commented Nov 25, 2019

lexaknyazev commented Dec 18, 2019

Add field for loaders to indicate what form of JSON parsing is used #13

Add field for loaders to indicate what form of JSON parsing is used #13

Comments

weegeekps commented Nov 11, 2019

javagl commented Nov 12, 2019

weegeekps commented Nov 15, 2019

lexaknyazev commented Nov 16, 2019

weegeekps commented Nov 25, 2019

lexaknyazev commented Dec 18, 2019