Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the rationale behind allowing YAML alias nodes as fragment ids? #72

Closed
MikeRalphson opened this issue Mar 6, 2023 · 9 comments
Labels

Comments

@MikeRalphson
Copy link

## Fragment identification {#application-yaml-fragment}
A fragment identifies a node in a stream.
A fragment identifier starting with "*"
is to be interpreted as a YAML alias node {{fragment-alias-node}}.

Again, raising here as suggested by @dret, but please direct me to existing issue/comments.

My concern is one of complexity and tooling support, with regard to parsing YAML to an in-memory object form, often the alias node identifiers are lost (replaced with object identity).

YAML alias nodes effectively exist only in the textual representation of a YAML resource, not the parsed version most applications are likely to be dealing with.

This may mean that for some languages a YAML parser which can output an AST representation of the YAML object tree, or some other mechanism such as an adjacency list may be required.

See #71 for concerns over different subtypes of application/yaml having different fragment resolution rules.

@ioggstream
Copy link
Collaborator

Hi @MikeRalphson!

We had a long discussion on this topic. See #47 Shortly:

  • since it's the YAML media type, we asked to the YAML community, that wanted a way to reference alias nodes;
  • we accommodated different prefix * for anchor names and / for json pointers

the alias node identifiers are lost (replaced with object identity).

you mean anchor names are not preserved, right? That's right, and this is highlighted here
Nonetheless, for the YAML community/developers anchor names are the primary way to reference nodes.

This may mean that for some languages a YAML parser which can output an AST representation of the YAML object tree, or some other mechanism such as an adjacency list may be required.

Can you please clarify with a little example?

Let us know, R.

@MikeRalphson
Copy link
Author

Hi @ioggstream thanks for the welcome.

you mean anchor names are not preserved, right?

Correct.

Can you please clarify with a little example?

This is getting a bit implementation-specific but using @eemeli's yaml package, the best I can do is:

function parseWithAliases(str) {
  const aliases = new Map();
  const ast = yaml.parseDocument(str);
  const walker = new AList(ast); // fast recurse through every object in the AST
  for (let [value,metadata] of walker) {
    if (yaml.isAlias(value)) {
      aliases.set(value.source,value.resolve(ast));
    }
  }
  return { data: ast.toJS(), aliases };
}

which, given the following input:

hello: &a
  world:
    message: Hello, world
outputs:
  *a

gives the following output:

{
  hello: { world: { message: 'Hello, world' } },
  outputs: { world: { message: 'Hello, world' } }
}

where hello and outputs have object identity, and the alias metadata looks like this:

Map(1) {
  'a' => YAMLMap {
    items: [
      Pair {
        key: Scalar {
          value: 'world',
          range: [ 13, 18, 18 ],
          source: 'world',
          type: 'PLAIN'
        },
        value: YAMLMap {
          items: [
            Pair {
              key: Scalar {
                value: 'message',
                range: [ 24, 31, 31 ],
                source: 'message',
                type: 'PLAIN'
              },
              value: Scalar {
                value: 'Hello, world',
                range: [ 33, 45, 46 ],
                source: 'Hello, world',
                type: 'PLAIN'
              }
            }
          ],
          range: [ 24, 46, 46 ]
        }
      }
    ],
    range: [ 13, 46, 46 ],
    anchor: 'a'
  }
}

As you can maybe see, the problem is the metadata available about the YAML aliases still refers to objects within the AST, and doesn't share any objects with the output JS object.

If what I consider the best Javascript YAML library doesn't provide this functionality (or I can't find it) and many other YAML parsers don't even have an abstraction above the output native object, then where is the universal utility in allowing YAML aliases as fragments?

@ioggstream
Copy link
Collaborator

ioggstream commented Mar 14, 2023

@MikeRalphson

wrt the code

Let me check if I understand the issue, and please (cc: @eemeli @perlpunk) correct me if I didn't get it:

  1. I parse the YAML Stream to the YAML Serialization Tree
  2. both the hello key and the output key reference the same node, labeled with the a anchor
  3. if I have to retrieve hello.yaml#*a, reading from anchor & aliases I 'd get the first occurrence of the anchor in the tree
  4. in any case, there's no guarantee that this node is JSON-serializable

Note: even using JSON Pointers on YAML (e.g. what.yaml#/what/a) could end in a non-JSON serializable node

# what.yaml
what: &a
  a: &wonderful
    world: *a

I hope I didn't miss the point though (it's midnight now in Italy, after all)

I think @eemeli / @perlpunk can provide some insight. I am happy to extend the security/interoperability considerations sections!

wrt the spec

Here I will share my 2¢ that go a bit beyond current implementations: I think we tried to create a common discussion ground between different communities (YAML, JSON, OAS, ...).

on OAS:

  • the original YAML document only supported alias nodes fragment identifier: we managed to find a solution that suited both worlds
  • there was a discussion whether +yaml media types should inherit automatically from application/yaml, and we identified a way to provide an extensibility mechanism.

on YAML:

  • YAML is used in many contexts: in this I think the most sensible choice is to listen to YAML folks. This is because YAML is an evolving language and they know better the direction that YAML will take in the future. I know they are working on YAML 1.3 ...
  • Since YAML accepts non-scalar keys, IIUC anchors are the mechanism provided by YAML to reference generic nodes, so it is reasonable to support it
  • Interoperability and security considerations provide guidance to implementers. Not every implementation supports fragids: for example, my browser pdf viewer does not support document.pdf#page=4 so I expect that there will be some implementation that will not support it.

Thanks for reading this long answer!

@MikeRalphson
Copy link
Author

@ioggstream thanks for providing the long answer!

there was a discussion whether +yaml media types should inherit automatically from application/yaml, and we identified a

Would love to know where this sentence was going to end up.

By guidance, do you mean fragment id resolution rules are optional or non-normative?

@eemeli
Copy link
Collaborator

eemeli commented Mar 15, 2023

My understanding of where we ended up is that for application/yaml, a fragment identifier like *foo ends up referring to a YAML anchor &foo defined somewhere within the file, but that for the +yaml suffix, no fragment identifier syntax is specified.

To me, this strikes the right balance for compatibility. If you're working with application/yaml or something based on it, you are explicitly working with YAML, which does feature e.g. anchors and aliases. If you're working with something that uses +yaml, you're working with data/configuration/content that isn't really tied to the serialisation format, but is expressible in other formats as well.

Now, as for e.g. my JS yaml library that was used as an example above, the criticism of its available APIs for resolving a part of a document is somewhat valid. With the example given above, at least the following is possible:

import { Alias, parseDocument } from 'yaml'

const doc = parseDocument(`
hello: &a
  world:
    message: Hello, world
outputs:
  *a
`)

new Alias('a').resolve(doc).toJSON()
// { world: { message: 'Hello, world' } }

To be fair, this only sort of works, as any further aliases within the resolved tree are not properly resolved, and duplicated aliases do not behave as specified for application/yaml. To fix this, I think it would be appropriate and relatively straightforward to add a .toJS(doc) method on each node to provide the fully resolved value, and an anchor/alias unique-ifier to effectively implement an API for the first-alias matching.

@MikeRalphson
Copy link
Author

Thanks @eemeli! Do you want me to raise an issue on yamlto track this?

@eemeli
Copy link
Collaborator

eemeli commented Mar 15, 2023

If you could, as two separate issues? Already working on the toJS() method (the API is relatively straightforward, though the implementation is a little delicate), but providing a solution for this bit will need a separate fix:

If multiple nodes would match a fragment identifier,
the first such match is selected.

@darrelmiller
Copy link
Contributor

@MikeRalphson Can we consider this issue as closed, based on on @eemeli rationale?

@ioggstream
Copy link
Collaborator

@MikeRalphson thanks for raising this issue, and 👏 to @eemeli for his support. I think that type of exchanges are great for interoperability. 🚀

@darrelmiller darrelmiller moved this from In Discussion to Closed in HttpApi Active Issues Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants