n-triples documents that cannot be converted to RDF graphs #33

pfps · 2023-06-27T22:33:58Z

The RDF n-triples specifies language tags as

LANGTAG ::= "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)*

But RDF concepts specifies language tags as follows

if and only if the [datatype IRI](https://w3c.github.io/rdf-concepts/spec/#dfn-datatype-iri) is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, a non-empty language tag as defined by [[BCP47](https://w3c.github.io/rdf-concepts/spec/#bib-bcp47)]. The language tag MUST be well-formed according to [section 2.2.9](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.9) of [[BCP47](https://w3c.github.io/rdf-concepts/spec/#bib-bcp47)].

The pointer into BCP47 ends up at a grammar that is considerably more restrictive.

What happens if the language tag in an n-triples document does not conform to this grammar?

This problem might affect other surface syntaxes for RDF.

The text was updated successfully, but these errors were encountered:

gkellogg · 2023-06-27T23:22:41Z

RDF EBNF grammars have always used a simple terminal production for matching LANGTAG; this does not change the requirement in RDF Concepts that language tags be valid according to BCP47. There are other cases where the EBNF terminal productions are permissive (e.g., IRIREF), and can accept tokens that would not be valid when interpreted according the the requirements of RDF Concepts. IIRC, the SPARQL grammar has the same provisions.

There are tests that look for bad IRIs and languages, but they could be better. In particular, a test that had a language tag that was accepted by the grammar but was invalid according to BCP47 would be good to have. There are IRI (URI) tests that pass through the grammar, but are expected to be detected as illegal.

afs · 2023-06-27T23:24:05Z

BCP47 is not immutable. It tracks the latest RFC.

What happens if the language tag in an n-triples document does not conform to this grammar?

Like any syntax deviation - it is out of scope.

A similar situation occurs with the IRIREF rule. Or illegal Unicode sequences in strings.
This is a known design choice. These external standards are not fixed and do change with restrictions as well as additions.

Replicating the full grammars would make the specs unwieldy even when possible.
An implementation is expected to apply secondary checks to conform.

There are practicality issues for that. Programming language libraries do not always track the latest specs, preferring backwards-compatibility.

URI is a moving target c.f. RFC6874, or URN changes in RFC8141 which invalidates syntax legal by RFC2141.

For language tags there are also the special cases allowed by RFC 3066 and continued in RFC4646 and RFC5646.

afs · 2023-06-27T23:32:09Z

IIRC, the SPARQL grammar has the same provisions.

https://www.w3.org/TR/sparql10-query/#rLANGTAG

pfps mentioned this issue Jun 27, 2023

Support for base direction #32

Closed

gkellogg added the test:missing-coverage Test-suite related: missing coverage label Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n-triples documents that cannot be converted to RDF graphs #33

n-triples documents that cannot be converted to RDF graphs #33

pfps commented Jun 27, 2023 •

edited

Loading

gkellogg commented Jun 27, 2023

afs commented Jun 27, 2023 •

edited

Loading

afs commented Jun 27, 2023

n-triples documents that cannot be converted to RDF graphs #33

n-triples documents that cannot be converted to RDF graphs #33

Comments

pfps commented Jun 27, 2023 • edited Loading

gkellogg commented Jun 27, 2023

afs commented Jun 27, 2023 • edited Loading

afs commented Jun 27, 2023

pfps commented Jun 27, 2023 •

edited

Loading

afs commented Jun 27, 2023 •

edited

Loading