File signature #7

nichtich · 2022-05-27T06:04:36Z

As a data consumer
I want an indicator to tell me that a file is probably YAML-LD
So that I know when to expect YAML-LD

Strict checking whether a YAML document is valid YAML-LD requires to follow the full specification. Nevertheless
some kind of magic file number would be useful. As suggested here a YAML global tag could be used for this purpose (see RFC 4151):

!<tag:json-ld.org,2022>
$context: http://schema.org/
$type: Person
name: Pierre-Antoine Champin

YAML processors will raise a "unknown tag" error when trying to process the document without knowledge of YAML-LD. It can still be parsed as valid YAML but there is no default mapping to JSON. This is not a bug, but a feature.

pchampin · 2022-05-27T14:36:19Z

I want an indicator to tell me that a file is probably YAML-LD

Assuming a specific content-type gets registered, would this answer your issue?
Or do you want the indicator to be in the content itself?

ioggstream · 2022-05-27T22:18:07Z

@nichtich some comments:

If we want to rely on YAML, we should consider that files could be normalized like that:

%YAML 1.2
---
!<tag:json-ld.org,2022>
$context: http://schema.org/
$type: Person
name: Pierre-Antoine Champin

This means that we cannot rely on magic numbers / file signatures.

I am reasoning on tags...

nichtich · 2022-05-28T05:55:05Z

we should consider that files could be normalized like that: [...]

There are several more ways to push down the start of actual content in a YAML file. YAML syntax is a beast.

Or do you want the indicator to be in the content itself?

Text based file formats rarely have traditional fixed positon magic file numbers but applications should be able to scan the first lines of a YAML file to detect whether it's meant to be YAML-LD.

By the way another solution to this requirement is to state that YAML-LD document must include $schema as its first key.

ioggstream · 2022-05-28T12:54:34Z

YAML syntax is a beast

This is not going to change :)

YAML-LD document must include $schema as its first key

YAML + JSON-LD + JSON-Schema will not be simpler that YAML + JSON-LD ;)

gkellogg · 2022-05-28T16:13:40Z

JSON-LD does not have a way to verify that the file is, in fact, JSON-LD if retrieved as application/json unless a describedby link relation points to a context. Otherwise, an embedded @context, which can show anywhere, is useful. While adding a magic-number may be a best practice, I don't think it should be required to treat the data as JSON-LD to be inline with general JSON-LD principles. At most, I'd say that files retrieved as application/ld+yaml SHOULD include a magic-number (whatever is settled on), but may not depending on circumstances.

ioggstream · 2022-05-28T22:42:41Z

Probably, the only way to include a magic number in yaml is by starting a document with a comment, Eg #?i-am-yaml-ld.

Imho it's not interoperable as comments are not preserved by all parser.

I agree with @gkellogg when he says that JSON -LD does rely on other information to determine whether a JSON documents is an LD. I think that, once parsed, we should take a similar approach.

anatoly-scherbakov · 2022-05-29T07:40:59Z

I would like to voice the following counter arguments to the introduction of special tags and headers.

Historical example: HTML

HTML had a required doctype header before HTML 5; everyone was copy-pasting that header (or generating with there IDE) — but ultimately, I do not believe it was very informative.

With modern HTML, the header had been reduced to just <!DOCTYPE html>. But even that — does it provide much more information than might be extracted from the existence of <html> root tag?

Duck typing

I am a proponent of conciseness. If the machine can interpret this file as YAML-LD, then it is YAML-LD. If it cannot do that it will yield an error message.

If a YAML file is loadable into an RDF graph (possibly with an external context) — it is YAML-LD.

Distinction between YAML and YAML-LD does not exist

The versatility of JSON-LD and, consequently, YAML-LD is rooted in the fact that a JSON or a YAML document managed by not-LD-aware software can be interpreted as a Linked Data document, even if you do not have control over its content. You just need to supply the right context.

For instance, I am interpreting GitHub API output as JSON-LD, and consuming it into a RDF graph, without any meaningful changes to the document itself. The same might apply to YAML data and configuration files. Just supply the proper context, and the file starts making real sense.

Thus, — how do you distinguish a YAML file vs a YAML-LD file? — You don't. All YAML is potentially YAML-LD if you have the proper context ready.

Summary

I'd voice against mandatory tags. They will limit the interoperability of the standard and the tooling around it, add syntactic noise and magic that non-technical domain experts will have to deal with when writing YAML-LD. I would think we should not burden them with that.

nichtich · 2022-05-30T09:17:00Z

Distinction between YAML and YAML-LD does not exist

If this was the case, there would be no need to define YAML-LD: just use an existing YAML2JSON conversion and use the result as JSON-LD. If, however, interpreting YAML-LD requires to process YAML-LD documents in any special way not covered by the default YAML2JSON mapping, I would better want to know whether a document requires this additional processing step.

anatoly-scherbakov · 2022-05-30T19:46:34Z

@nichtich with the idea of the $-context (#11 as per @gkellogg) the special conversion might be omitted, we'd only need the default one.

However, my opinion expressed above is about a slightly different thing. I meant that almost any valid YAML file can be interpreted as YAML-LD, and thus there is no need to specially mark some YAML files as YAML-LD with a special header, comment, or tag.

VladimirAlexiev · 2022-05-31T09:36:10Z

I side with @pchampin and @anatoly-scherbakov : if we come up with a signature, it should be recommended but not mandatory.

juusoautiosalo · 2022-06-22T19:51:29Z

Having browsed through the issues in this repository, it seems that the following design principle has been established:

Any valid JSON-LD document can be converted to a valid YAML-LD document with a generic YAML2JSON converter.

I am also in the understanding that JSON-LD does not have a file signature, so I think it cannot be mandatory for YAML-LD either.

(I'm new here and have not formed an opinion if it should be recommended or supported.)

gkellogg · 2022-06-22T21:01:17Z

Yes, that seems to be the emerging consensus, but a profile allowing more use of YAML features may also be supported eventually, but the base YAML-LD profile is likely limited to simple conversion of the parsed JSON, as described in #12.

Addresses #7, #8, #11, #12, #13, #17, #19, #31, and #35.

VladimirAlexiev · 2022-07-04T08:06:37Z

@nichtich I claim in #17 "The tag: URI scheme is recommended by the YAML people but is not mandatory, so I'd rather follow TimBL's principles of using resolvable URLs:".

Then #17 (comment) gives a detailed example.

So if we adopt a "signaling" tag, do you agree instead of

!<tag:json-ld.org,2022>

to use something like

!<https://w3c.github.io/yaml-ld-syntax/>

ioggstream · 2022-07-04T10:41:53Z

@gkellogg I propose to close as "wontfix" the "File signature" issue: like JSON-LD, we really need to inspect the content to understand whether it's YAML-LD.

A forced solution could just create clashes with future YAML versions (we're building upon YAML). I don't know how to make it work for example with files that contain multiple yaml documents, e.g.

# First document in foo.yaml
---
first: file
...
# second document, same file: foo.yaml
---
"I am": the second document
...

VladimirAlexiev · 2022-07-04T14:12:37Z

@ioggstream I propose to define a short useful but optional piece of advice to put in the Internet Media Type section

Eg https://www.w3.org/TR/turtle/#sec-mediaReg:

Magic number(s): Turtle documents may have the strings @prefix or @base (case sensitive) or the strings 'PREFIX' or 'BASE' (case insensitive) near the beginning of the document.

ioggstream · 2022-07-05T14:11:08Z

@VladimirAlexiev imho magic numbers need to be reliable. They are used and implemented by generic tools like the file command or by operating systems for file hinting / launching external programs.

I briefly scraped the media type registrations, and on ~ 896 application/* media types, the word "near" is used 1 times ( for sparql-query).

YAML does not provide magic number, and if I were to provide one in YAML-LD, I'd just say "See YAML".

My2¢, R.

nichtich added the UCR Issue on Use Case/Recommendation label May 27, 2022

gkellogg added a commit that referenced this issue Jul 2, 2022

Add remaining Use Case Issues.

89edbba

Addresses #7, #8, #11, #12, #13, #17, #19, #31, and #35.

gkellogg mentioned this issue Jul 3, 2022

Add remaining Use Case Issues. #37

Draft

ioggstream added this to the -00 milestone Jul 5, 2022

ioggstream added a commit that referenced this issue Jul 5, 2022

Fix: #7. Same as YAML.

c5df0dc

ioggstream mentioned this issue Jul 5, 2022

Fix: #7. Same as YAML. #48

Merged

gkellogg closed this as completed in #48 Jul 5, 2022

gkellogg pushed a commit that referenced this issue Jul 5, 2022

Fix: #7. Same as YAML.

c0c9fd6

anatoly-scherbakov mentioned this issue Jul 5, 2022

Implement Best Practices section #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File signature #7

File signature #7

nichtich commented May 27, 2022

pchampin commented May 27, 2022 •

edited

Loading

ioggstream commented May 27, 2022

nichtich commented May 28, 2022

ioggstream commented May 28, 2022

gkellogg commented May 28, 2022

ioggstream commented May 28, 2022

anatoly-scherbakov commented May 29, 2022

nichtich commented May 30, 2022

anatoly-scherbakov commented May 30, 2022

VladimirAlexiev commented May 31, 2022

juusoautiosalo commented Jun 22, 2022

gkellogg commented Jun 22, 2022

VladimirAlexiev commented Jul 4, 2022 •

edited

Loading

ioggstream commented Jul 4, 2022

VladimirAlexiev commented Jul 4, 2022 •

edited

Loading

ioggstream commented Jul 5, 2022 •

edited

Loading

File signature #7

File signature #7

Comments

nichtich commented May 27, 2022

pchampin commented May 27, 2022 • edited Loading

ioggstream commented May 27, 2022

nichtich commented May 28, 2022

ioggstream commented May 28, 2022

gkellogg commented May 28, 2022

ioggstream commented May 28, 2022

anatoly-scherbakov commented May 29, 2022

Historical example: HTML

Duck typing

Distinction between YAML and YAML-LD does not exist

Summary

nichtich commented May 30, 2022

anatoly-scherbakov commented May 30, 2022

VladimirAlexiev commented May 31, 2022

juusoautiosalo commented Jun 22, 2022

gkellogg commented Jun 22, 2022

VladimirAlexiev commented Jul 4, 2022 • edited Loading

ioggstream commented Jul 4, 2022

VladimirAlexiev commented Jul 4, 2022 • edited Loading

ioggstream commented Jul 5, 2022 • edited Loading

pchampin commented May 27, 2022 •

edited

Loading

VladimirAlexiev commented Jul 4, 2022 •

edited

Loading

VladimirAlexiev commented Jul 4, 2022 •

edited

Loading

ioggstream commented Jul 5, 2022 •

edited

Loading