Skip to content

Commit

Permalink
YAML Fragment as alias nodes or JSON Pointers. #41 (#47)
Browse files Browse the repository at this point in the history
* YAML Fragment as alias nodes or JSON Pointers. #41

* Discriminate using first character (/ or *)
* May be extended.

Co-authored-by: cabo <cabo@tzi.org>
Co-authored-by: Darrel <darrmi@microsoft.com>
  • Loading branch information
3 people authored Jun 20, 2022
1 parent 1200e88 commit 3f2122c
Show file tree
Hide file tree
Showing 3 changed files with 277 additions and 27 deletions.
162 changes: 140 additions & 22 deletions draft-ietf-httpapi-yaml-mediatypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ normative:
- ins: Eemeli Aro
- ins: Thomas Smith
target: https://yaml.org/spec/1.2.2/
oas:
OAS:
title: OpenAPI Specification 3.0.0
date: 2017-07-26
author:
Expand All @@ -55,8 +55,10 @@ normative:
- ins: Mike Ralphson
- ins: Ron Ratovsky
- ins: Uri Sarid
JSON-POINTER: RFC6901

informative:
I-D.ietf-jsonpath-base:

--- abstract

Expand All @@ -81,7 +83,7 @@ The source code and issues list for this draft can be found at
# Introduction

YAML [YAML] is a data serialization format that is widely used on the Internet,
including in the API sector (e.g. see [oas]),
including in the API sector (e.g. see [OAS]),
but the relevant media type and structured syntax suffix previously had not been registered by IANA.

To increase interoperability when exchanging YAML data
Expand Down Expand Up @@ -109,36 +111,41 @@ in this document are to be interpreted as in {{!SEMANTICS=I-D.ietf-httpbis-seman
The terms "fragment" and "fragment identifier"
in this document are to be interpreted as in {{!URI=RFC3986}}.

The terms "node", "anchor" and "named anchor"
The terms "node", "alias node", "anchor" and "named anchor"
in this document are to be intepreded as in [YAML].

## Fragment identification {#application-yaml-fragment}

This section describes how to use
named anchors (see Section 3.2.2.2 of [YAML])
alias nodes (see Section 3.2.2.2 and 7.1 of [YAML])
as fragment identifiers to designate nodes.

A YAML named anchor can be represented in a URI fragment identifier
A YAML alias node can be represented in a URI fragment identifier
by encoding it into octects using UTF-8 {{!UTF-8=RFC3629}},
while percent-encoding those characters not allowed by the fragment rule
in {{Section 3.5 of URI}}.

If multiple nodes would match a fragment identifier,
the first such match is selected.

A fragment identifier is not guaranteed to reference an existing node.
Therefore, applications SHOULD define how an unresolved alias node
ought to be handled.

Users concerned with interoperability of fragment identifiers:

- SHOULD limit named anchors to a set of characters
- SHOULD limit alias nodes to a set of characters
that do not require encoding
to be expressed as URI fragment identifiers:
this is always possible since named anchors are a serialization
this is generally possible since
named anchors are a serialization
detail;
- SHOULD NOT use a named anchor that matches multiple nodes.
- SHOULD NOT use alias nodes that match multiple nodes.

In the example resource below, the URL `file.yaml#foo`
references the anchor `foo` pointing to the node with value `scalar`;
In the example resource below, the URL `file.yaml#*foo`
references the alias node `*foo` pointing to the node with value `scalar`;
whereas
the URL `file.yaml#bar` references the anchor `bar` pointing to the node
the URL `file.yaml#*bar` references the alias node `*bar` pointing to the node
with value `[ some, sequence, items ]`.

~~~ example
Expand Down Expand Up @@ -190,7 +197,19 @@ Applications that use this media type:
: HTTP

Fragment identifier considerations:
: see {{application-yaml-fragment}}
: An empty fragment identifier references
the root node.

A fragment identifier starting with "*"
is to be interpreted as a YAML alias node {{application-yaml-fragment}}.

A fragment identifier starting with "/"
is to be interpreted as a JSON Pointer {{JSON-POINTER}}
and is evaluated on the YAML representation graph,
walking through alias nodes;
this syntax can only reference YAML nodes that are
on a path that is made up of nodes interoperable with
the JSON data model (see {{int-yaml-and-json}}).

Additional information:

Expand Down Expand Up @@ -328,15 +347,36 @@ issues with JSON:
`!mytag` (see Section 2.4 of [YAML]);

~~~ example
non-json-keys:
2020-01-01: a timestamp
[0, 1]: a sequence
? {k: v}
: a map
non-json-value: 2020-01-01
%YAML 1.2
---
non-json-keys:
0: a number
2020-01-01: a timestamp
[0, 1]: a sequence
? {k: v}
: a map
non-json-value: 2020-01-01
~~~
{: title="Example of mapping keys not supported in JSON" #example-unsupported-keys}

## Fragment identifiers {#int-fragment}

To allow fragment identifiers to traverse alias nodes,
the YAML representation graph needs to be generated before the fragment identifier evaluation.
It is important that this evaluation will not cause the issues mentioned in {{int-yaml-and-json}}
and in [Security considerations](#security-considerations) such as infinite loops and unexpected code execution.

Implementers need to consider that the YAML version and supported features (e.g. merge keys)
can impact on the generation of the representation graph (see {{example-merge-keys}}).

In {{application-yaml}}, this document extends the use of specifications based on
the JSON data model with support for YAML fragment identifiers.
This is to improve the interoperability of already consolidated practices,
such as the one of writing [OpenAPI documents](#OAS) in YAML.

{{ex-fragid}} provides a non exhaustive list of examples that could help
understanding interoperability issues related to fragment identifiers.

# Security Considerations

Security requirements for both media type and media type suffix
Expand All @@ -360,7 +400,11 @@ YAML documents are rooted, connected, directed graphs
and can contain reference cycles,
so they can't be treated as simple trees (see Section 3.2.1 of [YAML]).
An implementation that attempts to do that
can infinite-loop at some point (e.g. when trying to serialize a YAML document in JSON).
can infinite-loop traversing the YAML representation graph at some point,
for example:

- when trying to serialize it JSON;
- or when searching/identifying nodes using specifications based on the JSON data model (e.g. {{JSON-POINTER}}).

~~~ yaml
x: &x
Expand Down Expand Up @@ -407,9 +451,75 @@ with the registration information provided below.
| +yaml | {{suffix-yaml}} of this document |
|--------------------------|------------------------------------------|


--- back

# Examples related to fragment identifier interoperability {#ex-fragid}

## Unreferenceable nodes

In this example, a couple of YAML nodes that cannot be referenced
based on the JSON data model
since their mapping keys are not strings.

~~~ example
%YAML 1.2
---
a-map-cannot:
? {be: expressed}
: with a JSON Pointer

0: no numeric mapping keys in JSON
~~~
{: title="Example of YAML nodes that are not referenceable based on JSON data model." #example-unsupported-paths}

## Referencing a missing node

In this example the fragment `#/0` does not reference an existing node

~~~ example
0: "JSON Pointer `#/0` references a string mapping key."
~~~
{: title="Example of a JSON Pointer that does not reference an existing node." #example-missing-node}

## Representation graph with anchors and cyclic references

In this YAML document, the `#/foo/bar/baz` fragment identifier
traverses the representation graph and references the string `you`.
Moreover, the presence of a cyclic reference implies that
there are infinite fragment identifiers `#/foo/bat/../bat/bar`
referencing the `&anchor` node.

~~~ example
anchor: &anchor
baz: you
foo: &foo
bar: *anchor
bat: *foo
~~~
{: title="Example of a cyclic references and alias nodes." #example-cyclic-graph}

Many YAML implementations will resolve
[the merge key "<<:"](https://yaml.org/type/merge.html) defined in YAML 1.1
in the representation graph.
This means that the fragment `#/book/author/given_name` references the string `Federico`
and that the fragment `#/book/<<` will not reference any existing node.

~~~ example
%YAML 1.1
---
# Many implementations use merge keys.
the-viceroys: &the-viceroys
title: The Viceroys
author:
given_name: Federico
family_name: De Roberto
book:
<<: *the-viceroys
title: The Illusion
~~~
{: title="Example of YAML merge keys." #example-merge-keys}


# Acknowledgements

Thanks to Erik Wilde and David Biesack for being the initial contributors of this specification,
Expand All @@ -427,14 +537,22 @@ Manu Sporny
and Jason Desrosiers.

# FAQ
{: numbered="false"}
{: numbered="false" removeinrfc="true"}

Q: Why this document?
: After all these years, we still lack a proper media-type for YAML.
This has some security implications too
(eg. wrt on identifying parsers or treat downloads)

Q: Why using alias nodes as fragment identifiers?
: Alias nodes starts with `*`. This allow to distinguish
a fragment identifier expressed as an alias node from
one expressed in JSON Pointer {{JSON-POINTER}}
which is expected to start with `/`.
Moreover, since json-path {{I-D.ietf-jsonpath-base}} expressions
start with `$`, this mechanism is even extensible that specification.

# Change Log
{: numbered="false"}
{: numbered="false" removeinrfc="true"}

RFC EDITOR PLEASE DELETE THIS SECTION.
91 changes: 86 additions & 5 deletions test_yaml_json.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Roundtrip yaml/json.
from graphql import ValidationRule
from path import Path
from pathlib import Path
import yaml, json
import pytest

import logging
import abnf

testcases = yaml.safe_load(Path("yaml-json-interoperability.yaml").read_text())

import logging
fragment_identifier_testcases = yaml.safe_load(Path("yaml-fragment-identifiers.yaml").read_text())

log = logging.getLogger(__name__)

Expand Down Expand Up @@ -50,3 +49,85 @@ def test_supported(testname, testcase):
data = testcase["data"]
ret = yaml.safe_load(data)
assert testcase["expected"] == json_safe_dump(ret)


from urllib.parse import urlparse, urlsplit, urlunsplit
from urllib.parse import quote, unquote


def iri_to_uri(iri, encoding="utf-8"):
"Takes a Unicode string that can contain an IRI and emits a URI."
scheme, authority, path, query, frag = urlsplit(iri)
scheme = scheme.encode(encoding)
if ":" in authority:
host, port = authority.split(":", 1)
authority = host.encode("idna") + f":{port}".encode()
else:
authority = authority.encode(encoding)
path = quote(path.encode(encoding), safe="/;%[]=:$&()+,!?*@'~")
query = quote(query.encode(encoding), safe="/;%[]=:$&()+,!?*@'~")
frag = quote(frag.encode(encoding), safe="/;%[]=:$&()+,!?*@'~")
return urlunsplit(
x.encode() if hasattr(x, "encode") else x
for x in (scheme, authority, path, query, frag)
)


@pytest.mark.parametrize(
"alias_node",
[
"*foo",
"*foo-bar-baz",
"*però",
"*però/fara",
"*però/fara/perì",
"/components/schemas/Person",
"$.o.*",
"$['store']['book'][0]['title']",
],
)
def test_uri_alias_nodes(alias_node):
"""
fragment syntax:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
"""
s = "https://host.example:443/path.yaml#" + alias_node
url2 = iri_to_uri(s)
url = urlparse(url2)
print(
f'\n{{ "{alias_node}": {{ "iri": "{s}","url": "{url2.decode("""ascii""")}" }} }},'
)
fragment = unquote(url.fragment)
print(fragment)


@pytest.mark.parametrize("testcase", [
testcase for testcase in fragment_identifier_testcases["yaml-fragment-identifiers"]["data"]
])
def test_iri_full(testcase):
((alias_node, testcase),) = testcase.items()
url = urlparse(testcase["url"])
iri = urlparse(testcase["iri"])
parsed_fragment = unquote(url.fragment)
validate_uri_fragment(url.fragment)
iri_fragment = iri.fragment
assert parsed_fragment == iri_fragment


def validate_uri_fragment(uri_fragment):
rules = """
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
fragment = *( pchar / "/" / "?" )
"""
for rule in rules.strip().splitlines():
abnf.Rule.create(rule.strip())
return abnf.Rule('fragment').parse_all(uri_fragment)
Loading

0 comments on commit 3f2122c

Please sign in to comment.