YAML-LD datatypes (and tags for datatypes) #17

VladimirAlexiev · 2022-05-31T09:56:27Z

RDF uses explicitly tagged literals, in particular lang strings and XSD datatypes, including infinite precision integers and decimals.
JSON carries faithfully strings and small numbers, everything else must be represented as a string with a separate field to indicate the datatype (@type in JSON-LD). Eg see Elaborate on handling of JSON builtin types integer and double w3c/json-ld-syntax#387 for the pitfalls of using large integers or decimals
YAML can use tags to carry literals faithfully (including infinite precision, "markers" like -.inf and .nan, datetimes), and even more complex structures. One could declare "YAML schemas" with additional tags, eg to represent all XSD datatypes

Why might we want more than "string plus @type"?

convenience (eg see dc:date below and many other examples)
normalization (reduce/eliminate lexical vs value space differences): it seems to me easier for a processor to normalize 02022-05-18 to 2022-05-18 if tagged as !xsd!date rather than looking at a parallel @type field.

Let's collect below examples of what we could want.

@gkellogg in ietf-wg-httpapi/mediatypes#8 (comment)

If I were to revisit anything in the JSON-LD data model, it would be the interpretation of JSON numbers to allow for decimal values. As it is now, JSON numbers are either interpreted as integers (long) or doubles based on the range of the number. But, in JSON-LD 1.1, we use The JSON Canonicalization Scheme (RFC8785) as a way to represent numbers in the rdf:JSON datatype serialization, which allows for a serialization form of either integer, decimal, or double. This really only comes into play in JSON-LD when creating RDF literals from native JSON numbers (something which is generally a bad design point, but is there to allow a reasonable interpretation of native JSON forms), but could also come into play when representing those numbers in the data model, and thus in serializations to forms such as YAML.

@VladimirAlexiev from #2:

Tags are comparable to datatypes.
the YAML json schema and core schema handle string, boolean, integer, float (the latter allows things like -.inf and .nan).
https://yaml.org/type/ handles a wider set, in particular dates and datetimes. But please note these are considered deprecated in 1.2 and are being removed in 1.3 Remove timestamp examples from the 1.2 spec yaml/yaml-spec#268 (comment)
Maybe we should define a YAML schema to handle more xsd datatypes?
- It should aim to eliminate problems related to the limited and non-standardized set of JSON literals. Eg the JSON number 12345678901234567890.12345 is converted to RDF literal "12345678901234567168"^^xsd:integer (see jsonld playground)
- And could even work as a replacement of @type, eg

# short form using tags
dc:date: !xsd!date 2022-05-18

# instead of long form
dc:date: {"@type": xsd:date, "@value": 2022-05-18}

New ones:

is it at all feasible to write "foo"@en in YAML rather than a separate @language field?
JSON-LD cannot capture GeoJSON because that uses nested arrays. Can this be worked around somehow with a YAML tag for "2D array"?

The text was updated successfully, but these errors were encountered:

pchampin · 2022-05-31T10:23:07Z

JSON-LD cannot capture GeoJSON because that uses nested arrays.

This is not the case anymore with JSON-LD 1.1 (example)

ioggstream · 2022-05-31T12:51:55Z

This is another interesting direction to explore that does not seem to create inconsistencies with YAML spec, thanks Vladimir!
We could then ask the YAML community if it is possible to "register" in some way the xsd namespace to support this kind
of mappings and associate them to the yaml.org 1.2 namespace.

I suggest using full-URI tags in the examples for clarity, eg:

# see https://yaml.org/spec/1.2.2/#tag-directives
%TAG !xsd! tag:http://www.w3.org/2001/XMLSchema:
---
# short form using tags
dc:date: !xsd!date 2022-05-18

# instead of long form
dc:date: {"@type": xsd:date, "@value": 2022-05-18}

anatoly-scherbakov · 2022-05-31T19:04:54Z

I feel that manually specifying data types for each value is very tedious, and the tag syntax is not very intuitive. My feeling is this: why don't we delegate that task to the context?

The machine is smart enough to understand that a value of a dc:date is actually a literal with xsd:date datatype — and JSON-LD contexts can express that.

ioggstream · 2022-06-01T08:30:58Z

Can you post an example? Probably we should start collecting examples of "equivalence classes" of yaml files in this repo.

VladimirAlexiev · 2022-06-01T09:01:22Z

@ioggstream
We should use the actual XSD namespace. The tag: URI scheme is recommended by the YAML people but is not mandatory, so I'd rather follow TimBL's principles of using resolvable URLs:
%TAG !xsd! http://www.w3.org/2001/XMLSchema#

https://yaml.org/spec/1.2.2/#104-other-schemas allows us to make an XSD YAML scheme,
and we should ask the YAML people to publish it at https://yaml.org/type/

@anatoly-scherbakov Of course if a field ALWAYS uses the same datatype, the context can provide it. But dates in instance data often come in various granularities (same with numbers). So wouldn't it be nice to write this instead of the respective long forms?

dct:created   !xsd!gYear    2000
dct:issued    !xsd!date     2022-05-18
dct:modified  !xsd!dateTime 2022-05-18T01:12:23

pchampin · 2022-06-01T10:11:53Z

@anatoly-scherbakov

My feeling is this: why don't we delegate that task to the context?

Of course we can, and that's an important role of JSON-LD contexts: making explicit some implicit constrains/dependencies (e.g. "this field expects this datatype").

However, we also need a way to make this information explicit (e.g. in the expanded form of JSON-LD). In JSON-LD, this is done with a value object {"@value": "...", "@type": "..." }. In YAML-LD, tags provide a more concise and more idiomatic way to do it.

Also, +1 to @VladimirAlexiev use-case above.

anatoly-scherbakov · 2022-06-01T17:56:01Z

@VladimirAlexiev @ioggstream that is an interesting point. When using JSON-LD, I always tried to ensure that a particular property always maps to a specific type, but I agree that this application of tags is compelling. 👍

gkellogg · 2022-06-22T21:04:36Z

This was discussed during today's call: https://json-ld.org/minutes/2022-06-22/.

Addresses #7, #8, #11, #12, #13, #17, #19, #31, and #35.

gkellogg · 2022-07-20T17:37:30Z

This issue was discussed in today's meeting.

gkellogg · 2022-08-06T20:32:01Z

I think this is a great candidate for something an extended profile could do, and something like the %TAG ! http://www.w3.org/2001/XMLSchema# seems like a great way to go.

In my mind, this isn't a direct replacement for the @type of JSON-LD value objects, but an extension of the JSON-LD internal representation, much the say that booleans and numbers are treated in the JSON-LD (specifically to/from RDF algorithms). Implementations would need to maintain the internally typed values when expanding/compacting/framing, represent them using the appropriate tag when serializing to YAML in extended mode, or expanding them to value objects when serializing in the basic mode.

The toRdf and fromRdf algorithms would need to honor them when generating RDF or turning RDF back into the internal representation, again running with the appropriate processing mode.

Additional logic around step 9 of the Object to RDF Conversion Algorithm
Additional logic under step 2.4 of the RDF to Object Conversion algorithm when the useNativeTypes flag is true.

Otherwise, this change should be fairly transparent. IMO, this is the primary motivation for an extended profile.

rob-metalinkage · 2022-08-07T23:20:05Z

So what is actually in play here is a profile of YAML itself - the profile for which JSON-LD translations are lossless, so we dont need a profile of YAML-LD, but YAML-LD is an extension of a "YAML-JSON-compatible" profile. Such a profile could be implicit - or made explicit if multiple YAML/JSON conversions are defined. Another reason to make it explicit would be to validate if a given YAML document is compatible with YAML-LD before defining the YAML-LD extended syntax for that YAML schema.

gkellogg · 2022-08-07T23:55:35Z

I guess in my mind, the "YAML-JDON-compatible" profile is analogous to YAML using the JSON schema. This does not depend on explicit tags, but implicitly associates the values with tag:yaml.org,2002:null, tag:yaml.org,2002:bool, tag:yaml.org,2002:int, and tag:yaml.org,2002:float.

I think something like a "YAML-XSD-compatible" profile might require the use of a tag namespace such as suggested by @VladimirAlexiev: %TAG !xsd! tag:http://www.w3.org/2001/XMLSchema:, so a tagged value such as !xsd!dateTime 2022-05-18T01:12:23 would parse to a native DateTime literal, and the JSON-LD internal representation would be extended to support the various literal types from XSD.

If running in "extended", or "YAML-XSD-compatible" mode, a %TAG definition such as above would be legitimate. If not running in that mode, a processor may reject the input or use Postel's law and parse it, but it should not be emitted unless the profile is set accordingly.

In my mind, this and alias nodes are the primary think that would be enabled by an extended mode.

If a processor sees some other %TAG definition (or definitions outside of some pre-defined set) it should probably fail to process the document, which then acts as an extension point for processors to eventually support more values for %TAG in the future, but for RDF purposes, anything beyond the XSD set

Given this, I think we may be about ready to define the processing modes more completely.

rob-metalinkage · 2022-08-08T06:41:13Z

I'm thinking here about statements about conformance - :myresource dct:conformsTo - how do I know if a yaml resource is "YAML using the JSON schema." (the same holds true for the identifiers for YAML-LD and JSON-LD.)

general Use Case is to be able to determine what an API supports in terms of interoperability of data payloads. Can anyone orient me to where this is being defined or discussed? I can see inline directives such as https://yaml.org/spec/1.2.2/#681-yaml-directives, @context where a URI is referenced and $schema directives - but not where such things are registered - we have a related in IANA profiles on media types for encodings, but what about information content profiles?

Is identification of the profile out-of-band using resolvable identifiers (i.e. not in syntax-specific directives using syntax-specific keywords and versioning) a factor in defining processing modes?

gkellogg · 2022-08-14T00:29:00Z

I've looked into this some more as part of trying to implement extended support for XSD scalar values in YAML. IMO, the appropriate %TAG value would be something like the following:

%TAG ! http://www.w3.org/2001/XMLSchema#

This would allow values such as !date 2022-08-08, which would expand as !<http://www.w3.org/2001/XMLSchema#> "2022-08-08" and be a natural way to capture "2022-08-08"^^<http://www.w3.org/2001/XMLSchema#>. However, I'm stymied by a bug in LibYAML, which Ruby and many other languages rely on for parsing YAML (yaml/libyaml#253), where # is not accepted as a URI character (really ns-uri-char). So far, the LibYAML team has been unresponsive, and the library shows very little activity in the last couple of years. Of course, we could hack this with some other URI, but that doesn't seem appropriate for this group.

Other YAML tools show similar issues, I think largely due to the fact that that YAML spec only uses the tag scheme in its examples. Until this issue is resolved, I think we need to defer an extended mode for YAML-LD that would involve interpreting XSD datatype scalar values. The spec recommends the use of tag: (oddly), and if we were to go there, we would probably want to introduce something like %TAG ! tag:www.w3.org,2022:xsd/ but that seems quite arbitrary.

An example file I've been working with to exercise this variation is the following:

%YAML 1.2
%TAG ! http://www.w3.org/2001/XMLSchema#
---
"@context":
  "@vocab": http://xmlns.com/foaf/0.1/
name: !string Gregg Kellogg
homepage: https://greggkellogg.net/
depiction: http://www.gravatar.com/avatar/42f948adff3afaa52249d963117af7c8
date: !date 2022-08-08

(note, the use of a specific tag name shouldn't be significant. In this case, it's using the primary tag handle, but it could just as well be the secondary tag handle (!!) or a named tag handle (! xsd !) for our processing model).

If we are to support XSD types, we probably want to white-list allowed datatype URIs to include most XSD types, in addition to tag:yaml.org,2002:str, tag:yaml.org,2002:null, tag:yaml.org,2002:int, tag:yaml.org,2002:float, and tag:yaml.org,2002:bool which would map more directly to the JSON-LD Internal Representation.

See also yaml/yaml-spec#268 (comment).

gkellogg · 2022-08-14T00:34:35Z

is it at all feasible to write "foo"@en in YAML rather than a separate @language field?

No, I don't believe it is, however, we could consider using a datatype form such as defined for the i18n namespace:

@prefix i18n: <https://www.w3.org/ns/i18n#> .

[ ex:title "foo"^^i18n:en ] .

Although it's defined to allow a combination of language and base-direction, it can be used for just language or base direction. Of course, we would need to define that literal values using an i18n datatype consisting of only language would be translated to language-tagged literals, and visa-versa.

VladimirAlexiev · 2022-09-01T14:48:15Z

@gkellogg

I agree that the "YAML-JSON-compatible" profile should use the YAML JSON schema
- with a warning that it may mangle numbers (then people come complaining "why is my 12.3 converted to "1.230000005e2"^^xsd:float?")
I like !date 2022-08-08 better than !xsd!date 2022-08-08
I like your extended suggestion YAML-LD IRI tags #79 but how do we tag URLs? Do we just mandate !id in our "YAML XSD Schema"?

VladimirAlexiev · 2022-09-01T16:02:42Z

onlineyamltools.com allows # but then complains with:
Error: YAMLException: unknown tag !<http://www.w3.org/2001/XMLSchema#string> at line 6, column 28

Trying with explicit xsd tag gives the same error:

%YAML 1.2
%TAG !xsd! http://www.w3.org/2001/XMLSchema#
---
name: !xsd!string Gregg Kellogg

This tool can only use the "YAML JSON schema" builtin tags (and supports timestamp, although that has been deprecated).
As expected, it can mangle numbers:

%YAML 1.2
%TAG ! tag:yaml.org,2002:
---
name:   !str Gregg Kellogg
int:    !int 123
bigint: !int 123456789012345678901231                             # -> 1.2345678901234569e+23  ouch!
bigint: 123456789012345678901231                                  # -> 1.2345678901234569e+23  ouch!
float:  !float 1.235609853907835079889067406870964870956870967908 # -> 1.235609853907835
date:   !timestamp 2022-08-08 -> 2022-08-08T00:00:00.000Z

gkellogg · 2022-09-01T21:59:53Z

My implementation needed to use a lower-level parser that just transforms YAML to the Representation Graph without further interpretation. In Ruby Psych, this is done via Psych.parse_stream. That level shouldn't place constraints on any specific schema.

gkellogg · 2022-09-13T16:42:15Z

Discussed at TPAC F2F

VladimirAlexiev · 2022-09-28T16:36:19Z

Beyond XSD: let's not forget custom datatypes, eg:

geo:wktLiteral, geo:gmlLiteral, and 5-10 more new ones in GeoSPARQL 1.1 (eg geo:geoJson)
cdt:ucum, eg !cdt!ucum 1.20 m is equal to (though not identical to) !cdt!ucum 120 cm
see LINDT units of measure w3c/sparql-dev#129
the tentative rdf:JSON and rdf:YAML

gkellogg · 2022-09-30T17:36:59Z

This was discussed on [2022-09-28](https://json-ld.org/minutes/2022-09-28/#16).

Pierre-Antoine Champin: The devil is in the details, and in the bnodes :-D ✪

Vladimir Alexiev: I think we should use YAML tags in the form that datatypes are used for RDF. ✪

... JSON-LD is more verbose, and the YAML syntax is more concise.

... In many case the context will relieve you of this need, but there are cases where the graph is heterogeneus

... May be a problem with parsers.

... This also relates to YAML schemas, and how to attach types.

... YAML had a schema including dates, but have backed up.

... My proposal would be that the WG will declare a %TAG |xsd| ...

... But, implementers will need to use a better parser that supports tags.

... This is also important for numbers.

... We had trouble in xxx group, where the number would be mis-interpreted.

... Then we need to look at a YAML parsers matrix to determine how widely available it is.

Gregg Kellogg: The current "spec" refers to a basic profile, which doesn't include tags but only basic YAML values ✪

... and an Extended profile that includes XSD datatypes, and tags for URLs (is it absolute, or relative...)

... Gregg has an implementation that uses the YAML parse tree.

... Also in JSON-LD (discussion between Gregg and Antoine at TPAC), there is a movement towards handling more datatypes, and not mangling literals with default treatment of numbers

Distiller: http://rdf.greggkellogg.net/distiller?command=serialize ✪

Vladimir Alexiev: What about URLs? ✪

... In a heterogeneous dataset, the same field could contain either a string or a resource.

... can we have a single tag !id or !uri that would handle absolute, relative and CURIEs?

Gregg Kellogg: We want to explore some more use cases of URLs before deciding ✪

Vladimir Alexiev: Can we decide this issue? ✪

... let's not forget custom datatypes, eg geo:wktLiteral, geo:gmlLiteral, 5-10 more in GeoSPARQL 1.1, and the tentative rdf:JSON and rdf:YAML

Gregg Kellogg: Questions of quoting: is !xsd!integer '123' the same as !xsd!integer 123 and same as 123, or different? ✪

Niklas Lindström: Author: someone!tag-key => as if author was defined in the context with "`@type`": <tag-key>; then if e.g. someone!uri was encountered, *and* uri is defined as an alias of "`@id`", this is short for {"`@id`": "someone"} ✪

... the tag comes before the value, eg !tag-key someone

https://github.com/type -> `@type` ✪

https://github.com/id -> `@id` ✪

Gregg Kellogg: Tags should be declared in %TAG not in context, else we'll go against the grain of YAML ✪

TallTed · 2022-09-30T18:03:37Z

@gkellogg -- Several unfenced @ entities are in the last several lines of the bot-posted conversation #17 (comment) causing more unintended alerts to be fired in their direction.... Maybe the bot can be tweaked to codefence such entities going forward?

gkellogg · 2022-09-30T18:15:35Z

Sorry, must have been unfenced on IRC. I’ll fix them later

TallTed · 2022-10-01T22:33:22Z

Yeah, I'm sure they were unfenced on IRC. There's no consistent value to fencing there.

Weirdly, now that they're single-backtick fenced here, those backticks are showing as part of the text instead of being interpreted as markdown -- so, for instance, we now see (bold added here to help with clarity) {"`@id`": "someone"}, where we'd expect to see {"@id": "someone"}.

I suspect this won't be a quick or easy fix, but it should be raised with the folks running the (now several!) IRC/log-to-GitHub bots.

gkellogg · 2022-10-02T00:49:32Z

Well, I handle the irc log to HTML for these minutes, which were inserted here. Perhaps could detect some bare keywords, but you’re right that the result in the comment is wrongly interpreted, but that seems like a GH issue.

TallTed · 2022-10-02T02:35:19Z

I'd suggest wrapping the larger element including the @, so {"@id": "someone"}, which makes overall sense anyway, the larger element being code.

VladimirAlexiev mentioned this issue Jun 1, 2022

YAML-LD UCRs #2

Closed

14 tasks

gkellogg added the UCR Issue on Use Case/Recommendation label Jun 4, 2022

gkellogg closed this as completed Jun 4, 2022

gkellogg reopened this Jun 4, 2022

gkellogg added a commit that referenced this issue Jul 2, 2022

Add remaining Use Case Issues.

89edbba

Addresses #7, #8, #11, #12, #13, #17, #19, #31, and #35.

gkellogg mentioned this issue Jul 3, 2022

Add remaining Use Case Issues. #37

Draft

This was referenced Jul 4, 2022

File signature #7

Closed

YAML-LD context and frame #44

Open

ioggstream added this to the -future milestone Jul 5, 2022

gkellogg mentioned this issue Aug 1, 2022

Convert JSON-LD to YAML-LD using standard YAML libraries #12

Open

This was referenced Aug 6, 2022

Remove timestamp examples from the 1.2 spec yaml/yaml-spec#268

Open

Defining various interoperability profiles #35

Open

This comment was marked as resolved.

Sign in to view

gkellogg added the spec Issue on specification label Aug 17, 2022

gkellogg mentioned this issue Aug 26, 2022

Spec progress #78

Merged

gkellogg removed the spec Issue on specification label Oct 4, 2022

anatoly-scherbakov mentioned this issue Feb 17, 2023

Leave YAML-LD Extended profile out of bounds of the Community Report #88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML-LD datatypes (and tags for datatypes) #17

YAML-LD datatypes (and tags for datatypes) #17

VladimirAlexiev commented May 31, 2022 •

edited

Loading

pchampin commented May 31, 2022

ioggstream commented May 31, 2022 •

edited

Loading

anatoly-scherbakov commented May 31, 2022

ioggstream commented Jun 1, 2022

VladimirAlexiev commented Jun 1, 2022 •

edited

Loading

pchampin commented Jun 1, 2022

anatoly-scherbakov commented Jun 1, 2022

gkellogg commented Jun 22, 2022

gkellogg commented Jul 20, 2022

gkellogg commented Aug 6, 2022

rob-metalinkage commented Aug 7, 2022

gkellogg commented Aug 7, 2022

rob-metalinkage commented Aug 8, 2022 •

edited by gkellogg

Loading

This comment was marked as resolved.

gkellogg commented Aug 14, 2022

gkellogg commented Aug 14, 2022

VladimirAlexiev commented Sep 1, 2022 •

edited

Loading

VladimirAlexiev commented Sep 1, 2022 •

edited

Loading

gkellogg commented Sep 1, 2022

gkellogg commented Sep 13, 2022

VladimirAlexiev commented Sep 28, 2022 •

edited

Loading

gkellogg commented Sep 30, 2022 •

edited

Loading

TallTed commented Sep 30, 2022

gkellogg commented Sep 30, 2022

TallTed commented Oct 1, 2022

gkellogg commented Oct 2, 2022

TallTed commented Oct 2, 2022

YAML-LD datatypes (and tags for datatypes) #17

YAML-LD datatypes (and tags for datatypes) #17

Comments

VladimirAlexiev commented May 31, 2022 • edited Loading

pchampin commented May 31, 2022

ioggstream commented May 31, 2022 • edited Loading

anatoly-scherbakov commented May 31, 2022

ioggstream commented Jun 1, 2022

VladimirAlexiev commented Jun 1, 2022 • edited Loading

pchampin commented Jun 1, 2022

anatoly-scherbakov commented Jun 1, 2022

gkellogg commented Jun 22, 2022

gkellogg commented Jul 20, 2022

gkellogg commented Aug 6, 2022

rob-metalinkage commented Aug 7, 2022

gkellogg commented Aug 7, 2022

rob-metalinkage commented Aug 8, 2022 • edited by gkellogg Loading

This comment was marked as resolved.

gkellogg commented Aug 14, 2022

gkellogg commented Aug 14, 2022

VladimirAlexiev commented Sep 1, 2022 • edited Loading

VladimirAlexiev commented Sep 1, 2022 • edited Loading

gkellogg commented Sep 1, 2022

gkellogg commented Sep 13, 2022

VladimirAlexiev commented Sep 28, 2022 • edited Loading

gkellogg commented Sep 30, 2022 • edited Loading

TallTed commented Sep 30, 2022

gkellogg commented Sep 30, 2022

TallTed commented Oct 1, 2022

gkellogg commented Oct 2, 2022

TallTed commented Oct 2, 2022

VladimirAlexiev commented May 31, 2022 •

edited

Loading

ioggstream commented May 31, 2022 •

edited

Loading

VladimirAlexiev commented Jun 1, 2022 •

edited

Loading

rob-metalinkage commented Aug 8, 2022 •

edited by gkellogg

Loading

VladimirAlexiev commented Sep 1, 2022 •

edited

Loading

VladimirAlexiev commented Sep 1, 2022 •

edited

Loading

VladimirAlexiev commented Sep 28, 2022 •

edited

Loading

gkellogg commented Sep 30, 2022 •

edited

Loading