Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File signature #7

Closed
nichtich opened this issue May 27, 2022 · 16 comments · Fixed by #48
Closed

File signature #7

nichtich opened this issue May 27, 2022 · 16 comments · Fixed by #48
Labels
UCR Issue on Use Case/Recommendation
Milestone

Comments

@nichtich
Copy link

As a data consumer
I want an indicator to tell me that a file is probably YAML-LD
So that I know when to expect YAML-LD

Strict checking whether a YAML document is valid YAML-LD requires to follow the full specification. Nevertheless
some kind of magic file number would be useful. As suggested here a YAML global tag could be used for this purpose (see RFC 4151):

!<tag:json-ld.org,2022>
$context: http://schema.org/
$type: Person
name: Pierre-Antoine Champin

YAML processors will raise a "unknown tag" error when trying to process the document without knowledge of YAML-LD. It can still be parsed as valid YAML but there is no default mapping to JSON. This is not a bug, but a feature.

@nichtich nichtich added the UCR Issue on Use Case/Recommendation label May 27, 2022
@pchampin
Copy link
Contributor

pchampin commented May 27, 2022

I want an indicator to tell me that a file is probably YAML-LD

Assuming a specific content-type gets registered, would this answer your issue?
Or do you want the indicator to be in the content itself?

@ioggstream
Copy link
Contributor

@nichtich some comments:

  1. If we want to rely on YAML, we should consider that files could be normalized like that:
%YAML 1.2
---
!<tag:json-ld.org,2022>
$context: http://schema.org/
$type: Person
name: Pierre-Antoine Champin

This means that we cannot rely on magic numbers / file signatures.

  1. I am reasoning on tags...

@nichtich
Copy link
Author

we should consider that files could be normalized like that: [...]

There are several more ways to push down the start of actual content in a YAML file. YAML syntax is a beast.

Or do you want the indicator to be in the content itself?

Text based file formats rarely have traditional fixed positon magic file numbers but applications should be able to scan the first lines of a YAML file to detect whether it's meant to be YAML-LD.

By the way another solution to this requirement is to state that YAML-LD document must include $schema as its first key.

@ioggstream
Copy link
Contributor

YAML syntax is a beast

This is not going to change :)

YAML-LD document must include $schema as its first key

YAML + JSON-LD + JSON-Schema will not be simpler that YAML + JSON-LD ;)

@gkellogg
Copy link
Member

JSON-LD does not have a way to verify that the file is, in fact, JSON-LD if retrieved as application/json unless a describedby link relation points to a context. Otherwise, an embedded @context, which can show anywhere, is useful. While adding a magic-number may be a best practice, I don't think it should be required to treat the data as JSON-LD to be inline with general JSON-LD principles. At most, I'd say that files retrieved as application/ld+yaml SHOULD include a magic-number (whatever is settled on), but may not depending on circumstances.

@ioggstream
Copy link
Contributor

Probably, the only way to include a magic number in yaml is by starting a document with a comment, Eg #?i-am-yaml-ld.

Imho it's not interoperable as comments are not preserved by all parser.

I agree with @gkellogg when he says that JSON -LD does rely on other information to determine whether a JSON documents is an LD. I think that, once parsed, we should take a similar approach.

@anatoly-scherbakov
Copy link
Contributor

I would like to voice the following counter arguments to the introduction of special tags and headers.

Historical example: HTML

HTML had a required doctype header before HTML 5; everyone was copy-pasting that header (or generating with there IDE) — but ultimately, I do not believe it was very informative.

With modern HTML, the header had been reduced to just <!DOCTYPE html>. But even that — does it provide much more information than might be extracted from the existence of <html> root tag?

Duck typing

I am a proponent of conciseness. If the machine can interpret this file as YAML-LD, then it is YAML-LD. If it cannot do that it will yield an error message.

If a YAML file is loadable into an RDF graph (possibly with an external context) — it is YAML-LD.

Distinction between YAML and YAML-LD does not exist

The versatility of JSON-LD and, consequently, YAML-LD is rooted in the fact that a JSON or a YAML document managed by not-LD-aware software can be interpreted as a Linked Data document, even if you do not have control over its content. You just need to supply the right context.

For instance, I am interpreting GitHub API output as JSON-LD, and consuming it into a RDF graph, without any meaningful changes to the document itself. The same might apply to YAML data and configuration files. Just supply the proper context, and the file starts making real sense.

Thus, — how do you distinguish a YAML file vs a YAML-LD file? — You don't. All YAML is potentially YAML-LD if you have the proper context ready.

Summary

I'd voice against mandatory tags. They will limit the interoperability of the standard and the tooling around it, add syntactic noise and magic that non-technical domain experts will have to deal with when writing YAML-LD. I would think we should not burden them with that.

@nichtich
Copy link
Author

Distinction between YAML and YAML-LD does not exist

If this was the case, there would be no need to define YAML-LD: just use an existing YAML2JSON conversion and use the result as JSON-LD. If, however, interpreting YAML-LD requires to process YAML-LD documents in any special way not covered by the default YAML2JSON mapping, I would better want to know whether a document requires this additional processing step.

@anatoly-scherbakov
Copy link
Contributor

@nichtich with the idea of the $-context (#11 as per @gkellogg) the special conversion might be omitted, we'd only need the default one.

However, my opinion expressed above is about a slightly different thing. I meant that almost any valid YAML file can be interpreted as YAML-LD, and thus there is no need to specially mark some YAML files as YAML-LD with a special header, comment, or tag.

@VladimirAlexiev
Copy link
Contributor

I side with @pchampin and @anatoly-scherbakov : if we come up with a signature, it should be recommended but not mandatory.

@juusoautiosalo
Copy link

Having browsed through the issues in this repository, it seems that the following design principle has been established:

Any valid JSON-LD document can be converted to a valid YAML-LD document with a generic YAML2JSON converter.

I am also in the understanding that JSON-LD does not have a file signature, so I think it cannot be mandatory for YAML-LD either.

(I'm new here and have not formed an opinion if it should be recommended or supported.)

@gkellogg
Copy link
Member

Yes, that seems to be the emerging consensus, but a profile allowing more use of YAML features may also be supported eventually, but the base YAML-LD profile is likely limited to simple conversion of the parsed JSON, as described in #12.

gkellogg added a commit that referenced this issue Jul 2, 2022
@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jul 4, 2022

@nichtich I claim in #17 "The tag: URI scheme is recommended by the YAML people but is not mandatory, so I'd rather follow TimBL's principles of using resolvable URLs:".

Then #17 (comment) gives a detailed example.

So if we adopt a "signaling" tag, do you agree instead of

!<tag:json-ld.org,2022>

to use something like

!<https://w3c.github.io/yaml-ld-syntax/>

@ioggstream
Copy link
Contributor

@gkellogg I propose to close as "wontfix" the "File signature" issue: like JSON-LD, we really need to inspect the content to understand whether it's YAML-LD.

A forced solution could just create clashes with future YAML versions (we're building upon YAML). I don't know how to make it work for example with files that contain multiple yaml documents, e.g.

# First document in foo.yaml
---
first: file
...
# second document, same file: foo.yaml
---
"I am": the second document
...

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jul 4, 2022

@ioggstream I propose to define a short useful but optional piece of advice to put in the Internet Media Type section

Eg https://www.w3.org/TR/turtle/#sec-mediaReg:

Magic number(s): Turtle documents may have the strings @prefix or @base (case sensitive) or the strings 'PREFIX' or 'BASE' (case insensitive) near the beginning of the document.

@ioggstream
Copy link
Contributor

ioggstream commented Jul 5, 2022

@VladimirAlexiev imho magic numbers need to be reliable. They are used and implemented by generic tools like the file command or by operating systems for file hinting / launching external programs.

I briefly scraped the media type registrations, and on ~ 896 application/* media types, the word "near" is used 1 times ( for sparql-query).

YAML does not provide magic number, and if I were to provide one in YAML-LD, I'd just say "See YAML".

My2¢, R.

@ioggstream ioggstream added this to the -00 milestone Jul 5, 2022
ioggstream added a commit that referenced this issue Jul 5, 2022
gkellogg pushed a commit that referenced this issue Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UCR Issue on Use Case/Recommendation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants