Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShExC valueSetValue/exclusions examples inconsistent with example #42

Open
gkellogg opened this issue Aug 29, 2021 · 2 comments
Open

ShExC valueSetValue/exclusions examples inconsistent with example #42

gkellogg opened this issue Aug 29, 2021 · 2 comments

Comments

@gkellogg
Copy link
Collaborator

As noted in https://lists.w3.org/Archives/Public/public-shex/2021Aug/0001.html:

In ShEx 2.0, the productions were defined as follows:

[49]    valueSetValue         ::= iriRange | literalRange | languageRange | '.' exclusion+
[50]    exclusion             ::= '-' (iri | literal | LANGTAG) '~'?

In ShEx 2.1, they were updated to the following:

[49]    valueSetValue         ::= iriRange | literalRange | languageRange | exclusion+
[50]    exclusion             ::= '.' '-' (iri | literal | LANGTAG) '~’?

But, the note on [49] still notes "If "." matches and exclusion matches one or more times”, and that doesn’t make sense in this context. Also, the third ValuesConstraint example has a ‘.’ only at the beginning:

ex:EmployeeShape {
  foaf:mbox [ . - <mailto:engineering->~ - <mailto:sales->~ ]
}

Looks like the changes were made in error? Certainly, the new grammar is not forward-compatible with 2.0.

@ericprud
Copy link

ericprud commented Sep 23, 2021

While we're dealing with this, I think we can have a bit more sanity checking by saying that the exclusions have to be homogeneous. As a counter example. consider

  foo:code [. # any RDF term...
    - 'a'~ - 'e'~  # ... except strings starting with 'a' or 'e'
    - @en-UK~ - @fr~ # ... or British or French RDF langStrings (regardless of region, script, etc.)
  ]

Would it permit this?:

<s> foo:code <http://a.example> .

The grammar would imply that it does but in ShExJ, we see that exclusions are typed, e.g. LiteralStemRange and LanguageStemRange in:

      { "type": "TripleConstraint",
        "predicate": "...code",
        "valueExpr": {
          "type": "NodeConstraint",
          "values": [
            { "type": "LiteralStemRange",
              "stem": { "type": "Wildcard" },
              "exclusions": [
                { "type": "LiteralStem", "stem": "a" },
                { "type": "LiteralStem", "stem": "e" }
              ] },
            { "type": "LanguageStemRange",
              "stem": { "type": "Wildcard" },
              "exclusions": [
                "en-UK",
                "fr"
              ] }
          ] } }

With homogenous exclusions, we can reflect the ShExJ. You could still state the above, but you'd need two terms in the valueSet:

  foo:code [
    . -'a'~ -'e'~ # any string, except one starting with 'a' or 'e'
    . -@en-UK~ -@fr~ # none of them Britishisms, and nothing French
  ]

Here's the grammar that ShExJS uses (which passes the tests):

valueSetValue: iriRange | literalRange | languageRange
    | '.' (iriExclusion+ | literalExclusion+ | languageExclusion+)

iriRange: iri ('~' iriExclusion*)?

iriExclusion: '-' iri '~'?

literalRange: literal ('~' literalExclusion*)?

literalExclusion: '-' literal '~'?

languageRange:
      LANGTAG ('~' languageExclusion*)?
    | '@' '~' languageExclusion*

languageExclusion: '-' LANGTAG '~'?

Which lines up with https://github.com/shexSpec/grammar/blob/master/ShExDoc.g4#L149-L161.

PROPOSE: adopt the ANTLR productions for valueSetValue,

@gkellogg
Copy link
Collaborator Author

That seems reasonable, although I'll need to implement it for myself to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants