Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonInclude Processing Extension #124

Open
greenbergjosh opened this issue Oct 2, 2021 · 9 comments
Open

JsonInclude Processing Extension #124

greenbergjosh opened this issue Oct 2, 2021 · 9 comments
Labels

Comments

@greenbergjosh
Copy link

I have added, what I will call, JsonInclude support to the the json-everything library. At Greg's suggestion, I am opening this issue to see if there is interest in the community to create a formal spec for this functionality (which I will explain below). I am also happy to share the code with anyone who is interested.

The fundamental goal for JsonInclude is to allow one json document to point to others such that a JsonPath expression can seamlessly jump between documents.

For example, consider a server hosting a number of Json documents. For illustration, imagine the server running at http://example.com, and exposing documents by integer key, such as http://example.com/10, which will return the Json document with key 10.

Now, consider a document, http://example.com/10 with the contents:
{ "a": { "b": "@@http://example.com/11" } }
And, a second document, http://example.com/11 with the contents:
{ "c": { "d": "Hello" } }

Given the following code:

using var instance = await JsonDocument.ParseAsync(await httpClient.GetStreamAsync(
"http://example.com/10"));
var path = JsonPath.Parse("$.a.b.c.d");
var results = path.Evaluate(instance.RootElement);

The results variable will contain the PathMatch for "Hello".

This is, of course, the simplest example. At present, my implementation supports the following functionality.

  1. Caching. Once a remote document is loaded, it will not be loaded again.
  2. Arbitrary user-defined resolvers. At present, there are implementations for the following resolvers:
    a. @@http pointing to root
    b. @@http pointing to fragment using JsonPointer
    c. @@http pointing to array of fragments using JsonPath
    d. $ref= pointing to root
    e. $ref= pointing to fragment using JsonPointer
    f. $ref= pointing to array of fragments using JsonPath
    g. Additional resolvers can be added at will
    h. Arbitrary user-defined retrievers to actually go and get the remote document. Presently, I have only implemented an http/s
    retriever, but one could easily envision a retriever that gets the document from a database.

As an example of a different resolver, consider the JsonSchema style { "$ref": "http://example.com/11" }. If we wanted to use this style resolver, our two example documents would appear as:

/10: { "a": { "b": {"$ref": "@@http://example.com/11" } } }

/11: { "c": { "d": "Hello" } }

In general, the code allows the user to easily define and register new resolvers with five or ten lines of code. Registration of a new resolver looks like this:

JsonPathInclude.Resolvers.Add((match) => (match.Value.ValueKind == JsonValueKind.String && match.Value.GetString().StartsWith("@@http://")), (match, idx) => JsonPathInclude.StringRootResolver(match, idx));

The above code will resolve pointers to remote documents that are specified as string properties prefixed with @@. Alternatively, or simultaneously, since one can register as many resolvers as desired, we can register a resolver for the $ref case.

JsonPathInclude.Resolvers.Add((match) => match.Value.ValueKind == JsonValueKind.Object && match.Value.HasProperty("ref$"),
(match, idx) => JsonPathInclude.RefRootResolver(match, idx));

In addition to pointing to a remote document at its root, the code also supports using JsonPointer and JsonPath.

For example, @@http://localhost:8002/11#/c points to the document /11, but chooses to include only the contents pointed to by the JsonPointer("/c").

Likewise, @@http://localhost:8002/11#$.c.d3[*], points to /11, but uses the JsonPath("$.c.d3[*]") to find every matching node in /11 (not shown in the shortened example document above) and include those nodes as an array under the node in /10 containing this reference. In this way, a document can reference any JsonPath addressable subset of nodes in any other document. And, as always, a JsonPath expression will seamlessly jump across to that subset of nodes in the referenced document.

In summary, JsonInclude supports

  1. Seamless references between Json documents beneath technologies like JsonPath, thereby allowing JsonPath expressions to extend across a graph/network of related documents. All features of JsonPath are supported - the JsonInclude library works below the level of JsonPath.
  2. Arbitrary user defined resolvers can be added so the format of the references can be anything the user chooses. For example, I have shown all references as URLs, however, they could easily be nothing more than a UUID, which the resolver could convert to a URL or to any other string that the retriever can use to access the remote document.
  3. Arbitrary user defined retrievers
  4. Ability to point not only to a remote document root, but also to a specific node using JsonPointer, or to a specific subset of nodes using JsonPath.

If my explanation is unclear in any way, please let me know. I hope others might find this feature to be useful and, if so, it would certainly be great to ultimately see it become a standard and find it's way into the json-everything library. Regardless, if anyone would like the code, I would be happy to oblige.

Kind Regards.

@gregsdennis
Copy link
Collaborator

Thanks for posting this @greenbergjosh! I'd like to take this from a perspective of the feature as is pertains to JSON Path in general, excluding anything specific within my implementation.

I really like this idea, and @greenbergjosh did some great work modifying my library to implement it. I requested that he post this here as an example of a potential extension that has more to do with processing JSON data than it does the JSON Path syntax.

For this to work, no changes need to be made to the JSON Path syntax itself. The magic is in node selection.

A minor correction

The @@http syntax was the initial proposal. It was merely a string that could be replaced with the contents of the document to which the URI pointed. It would be embedded in a JSON value like so:

{
  "foo": "@@http://example.com/data"
}

The selector would see the @@ followed by a URI and recognize it as a reference to additional data. It would locate said data and replace the @@ string with the data and continue selection.

I had proposed the "object with a $ref" syntax as an alternative based on how JSON Schema manages references. I had intended for that to be a replacement to the @@ key, but I don't think I was clear. My opinion is that this feature should follow in step with JSON Schema and just have { "$ref": "http://..." } be the indicator to fetch the specified data and continue selection. The above example would then be modified to:

{
  "foo": { "$ref": "http://example.com/data" }
}

It's a bit more verbose, but because of that, it's also more explicit.

One interesting aspect of this solution is that the selector can be a bit smarter as well. If the path contains $ref, then the selector just takes the value without dereferencing. so $.foo['$ref'] would actually yield the URI http://example.com/data.

(The really cool thing about doing this is interoperability with JSON Schema. You could select nodes from a JSON Schema while dereferencing its built in references!)

Further notes on resolution

In supporting this reference mechanic, the JSON Schema spec contains language that specifically states that implementations are not required to fetch the data. The values of $ref are actually URIs, not URLs. The idea is that the implementation is not required to make network calls or look on the file system; it can instead have additional documents preloaded with URIs. I think this is an important aspect that should be copied/borrowed.

@greenbergjosh
Copy link
Author

greenbergjosh commented Oct 3, 2021

It is likely worth mentioning the idea of cycle detection. I have not yet added this to my implementation, but it might be worth considering. I don’t think all cycles are necessarily bad. Consider, parent.child[0].parents[*].name, which we might consider as returning the names of both parents of a child, having started from one parent. To avoid confusion, I am speaking of human parents and children in this example. Clearly, there is a cycle, but the result is still useful. For this reason, it might make sense to specify a max revisit count. As I see Greg focused here more on specification than implementation, I will comment that this may be an implementation detail.

It might make sense to consider a distinction between implicit and explicit cycles. For example, if I do $.a..f I may hit an include cycle and I have not explicitly requested it, so it would be implicit. On the other hand, if I do $.a.b.a.c, I have explicitly requested it, and it is clearly finite. So, perhaps one could allow explicit cycles but disallow implicit cycles that exceed some specified length.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Oct 5, 2021

I've added this as an experimental option on https://json-everything.net/json-path.

Using the path

$..description

on

{"$ref": "https://raw.githubusercontent.com/json-schema-org/JSON-Schema-Test-Suite/master/tests/draft6/const.json"}

yields all of the descriptions from that file in the JSON Schema Test Suite.

@remorhaz
Copy link
Contributor

remorhaz commented Oct 6, 2021

Looks interesting, but what about authorization - where do we put user credentials or tokens? Into some external config?

@gregsdennis
Copy link
Collaborator

gregsdennis commented Oct 6, 2021

Those are good questions, and they should be considered if we decide to add this to the spec.

For my experimental implementation, the referenced documents need to be publicly available on the network where the processor is running.

@cabo cabo added the enhancement New feature or request label Nov 9, 2021
@ghost
Copy link

ghost commented Jan 14, 2022

Apologies for not updating this sooner, however this was discussed at IETF 112 and the consensus made was that this shouldn't be included in the base specification, but it should be possible in the future if we support extensions as part of the specification. For now I'll close this but if still feel strongly about it I would recommend you attend future meetings (the next being on the 18th of January) to discuss further.

@ghost ghost closed this as completed Jan 14, 2022
@gregsdennis gregsdennis reopened this Jan 14, 2022
@gregsdennis
Copy link
Collaborator

It's still worth keeping this issue open to track the feature idea, even if it's not going to be part of the first release.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

Issues that are not relevant to -base but could be discussed again once -base is done now have label "revisit-after-base-done".

@gregsdennis
Copy link
Collaborator

@cabo since you're working on the extensions stuff now, I thought I'd remind you of this. It doesn't fit into the function-syntax expression extensions, but it's still an extension.

@glyn glyn reopened this Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants