Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n-triples documents that cannot be converted to RDF graphs #33

Open
pfps opened this issue Jun 27, 2023 · 3 comments
Open

n-triples documents that cannot be converted to RDF graphs #33

pfps opened this issue Jun 27, 2023 · 3 comments
Labels
test:missing-coverage Test-suite related: missing coverage

Comments

@pfps
Copy link
Contributor

pfps commented Jun 27, 2023

The RDF n-triples specifies language tags as

LANGTAG ::= "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)*

But RDF concepts specifies language tags as follows

if and only if the [datatype IRI](https://w3c.github.io/rdf-concepts/spec/#dfn-datatype-iri) is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, a non-empty language tag as defined by [[BCP47](https://w3c.github.io/rdf-concepts/spec/#bib-bcp47)]. The language tag MUST be well-formed according to [section 2.2.9](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.9) of [[BCP47](https://w3c.github.io/rdf-concepts/spec/#bib-bcp47)].

The pointer into BCP47 ends up at a grammar that is considerably more restrictive.

What happens if the language tag in an n-triples document does not conform to this grammar?

This problem might affect other surface syntaxes for RDF.

@gkellogg
Copy link
Member

RDF EBNF grammars have always used a simple terminal production for matching LANGTAG; this does not change the requirement in RDF Concepts that language tags be valid according to BCP47. There are other cases where the EBNF terminal productions are permissive (e.g., IRIREF), and can accept tokens that would not be valid when interpreted according the the requirements of RDF Concepts. IIRC, the SPARQL grammar has the same provisions.

There are tests that look for bad IRIs and languages, but they could be better. In particular, a test that had a language tag that was accepted by the grammar but was invalid according to BCP47 would be good to have. There are IRI (URI) tests that pass through the grammar, but are expected to be detected as illegal.

@gkellogg gkellogg added the test:missing-coverage Test-suite related: missing coverage label Jun 27, 2023
@afs
Copy link
Contributor

afs commented Jun 27, 2023

BCP47 is not immutable. It tracks the latest RFC.

What happens if the language tag in an n-triples document does not conform to this grammar?

Like any syntax deviation - it is out of scope.

A similar situation occurs with the IRIREF rule. Or illegal Unicode sequences in strings.
This is a known design choice. These external standards are not fixed and do change with restrictions as well as additions.

Replicating the full grammars would make the specs unwieldy even when possible.
An implementation is expected to apply secondary checks to conform.

There are practicality issues for that. Programming language libraries do not always track the latest specs, preferring backwards-compatibility.

URI is a moving target c.f. RFC6874, or URN changes in RFC8141 which invalidates syntax legal by RFC2141.

For language tags there are also the special cases allowed by RFC 3066 and continued in RFC4646 and RFC5646.

@afs
Copy link
Contributor

afs commented Jun 27, 2023

IIRC, the SPARQL grammar has the same provisions.

https://www.w3.org/TR/sparql10-query/#rLANGTAG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test:missing-coverage Test-suite related: missing coverage
Projects
None yet
Development

No branches or pull requests

3 participants