Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve current bug in UCO that does not require globally unique IDs for all class objects #430

Closed
14 tasks done
sbarnum opened this issue Aug 4, 2022 · 34 comments · Fixed by #467
Closed
14 tasks done

Comments

@sbarnum
Copy link
Contributor

sbarnum commented Aug 4, 2022

Background

The following excerpted portion of the UCO Design Document (https://unifiedcyberontology.org/resources/uco_design_document.html) provides a summary overview of the various types of classes in UCO and how they work together.

"In the UCO RDFS/OWL/SHACL ontology, classes are defined for any relevant domain concept as well as for any structured concept characterizing some aspect of a domain concept. These are structured concept classes that specify into UcoObject classes, Facet classes and other classes. UcoObject and Facet classes therefore are structured concept classes, however, UcoObject classes and Facet classes are disjoint from each other. Moreover, Facet classes inhere in UcoObject classes; this implies that for a facet concept to exist, it is dependent on the existence of the UcoObject concept that bears the facet. For example, when destroying a red car, the car as bearer for the red color is removed and with it, its red color disappears. Note that the reverse is not true; UcoObjects are not existentially dependent on facets, and, thus, cannot inhere in them. Note further that, although the example suggests that facets are compulsary for UcoObject concepts, this is not the case.
Domain concept classes (e.g., File, Action, Identity, Location, Device, etc.) are defined as subclasses of the UcoObject class. Facet classes characterize a particular pattern of properties that potentially apply for more than one domain class; a color, weight, an address and alike (described in Section 5 below) represent characteristics that apply not only for cars, but also for houses, persons, books and what have you.
Domain concept classes represent the things whereas facet classes represent the thing’s characteristics. The disjointness between them follows from the fact that the thing can never be the same as its characteristics.
All objects in UCO must specify a globally unique identifier (discussed in Section 4 below) and an assertion of the class type of the object."

The last line of the above excerpt is very important and highlights an overlooked bug in the current and past implementations of UCO.
Currently, only UcoObject specifically codifies the core:id and core:type properties providing/requiring a globally unique identifier for each instance of the class.
Without such a codification and requirement, subclasses of core:Facet or any other structured classes (core:ExternalReference, marking:GranularMarking, observable:MimePartType, etc) in UCO are simply treated as blank nodes with a locally (NOT globally) defined ID.

From the W3C wiki page (https://www.w3.org/wiki/BlankNodes) on blank nodes:

You can identify BlankNodes locally with a NodeId. that ID can be used to talk about the node inside your particular file/store of information, but you can't use it to ID the node externally.

This means that UCO content within a single file or produced within a single, uniform store of information has the potential to hang together in a coherent fashion but as soon as you attempt to merge or blend graphs from different files or information stores (a critical fundamental purpose for UCO) the graph falls apart as the lack of globally unique IDs on non-UcoObject class objects means that they lose coherence with the UcoObject they are part of. Local NodeIds are typically assigned by RDF processors following similar or identical algorithms for each set of content leading to a certainty of ID conflicts in merged content.

This is a critical bug that needs addressed.

Requirements

Requirement 1

Every individual instance of a UCO class must have a globally unique id

Requirement 2

Merged graphs of UCO content from different files, information stores or producers must maintain relational graph integrity where non-UcoObject class objects maintain unique and coherent relation to the UcoObjects they are an inherent part of.

Risk / Benefit analysis

Benefits

Content blended from multiple UCO graphs (a fundamental purpose of UCO) will be possible.

Risks

Increases each non-UcoObject class object by one property.
Existing examples will need to be updated.

Competencies demonstrated

Competency 1

Maintain integrity of UCO content in merged graphs from multiple origins

Competency Question 1.1

Query a UcoObject containing inherent embedded class content (e.g. a File observable object containing a FileFacet with property content)

Result 1.1

Return the full UcoObject with all of the embedded (FileFacet) content with accuracy and integrity

Competency Question 1.2

Query a merged graph for multiple UcoObjects (from different origin graphs) containing inherent embedded class content.

Result 1.2

Return the full UcoObject swith all of the embedded (FileFacet) content with accuracy and integrity

Solution suggestion

  • Create new core:ClassBase class in the core namespace
  • Move SHACL property shapes for core:id and core:type from the core:UcoObject class to the core:ClassBase class
  • Modify core:UcoObject to be subclass of core:ClassBase
  • Modify core:Facet to be subclass of core:ClassBase
  • Modify all classes in UCO with no superclass other than owl:Thing to be a subclass of core:ClassBase
  • Modify all SHACL property shapes for ObjectProperties to utilize sh:nodeKind sh:IRI rather than sh:nodeKind sh:BlankNodeOrIRI
  • Modify the disjoint statement between core:UcoObject and core:Facet to include all of the other sibling direct subclasses of core:ClassBase
[]
    a owl:AllDisjointClasses ;
    owl:members (
        array:ArrayOfAction
        tool:BuildConfigurationType
        # ... there are actually quite a lot ...
        core:Facet
        core:UcoObject
        # ...
    ) ;
    .

This proposed solution of utilizing a defined common base class for all UCO classes to specify the required globally unique ID for all classes is cleaner than simply adding core:id and core:type to each of the non-UcoObject classes in UCO. It is also easier to maintain and provides better coherence to the UCO class tree and cleans up much of the current messiness in the class hierarchy.

Examples

This simple example is from the same Section 3 of the UCO Design Document as the excerpt quoted in the Background section above:

{
  "@graph": [
    {
      "@id": "kb:person-952c09ff-5a38-483b-9dcf-6d8f0b27dfac",
      "@type": "identity:Person",
      "core:objectCreatedTime": {
        "@type": "xsd:dateTime",
        "@value": "2017-06-25T12:12:12.12Z"
      },
      "core:name": "John Smith",
      "core:hasFacet": [
        {
          "@id": "kb:5ecfbe78-e7c7-4b23-97fd-5ede9cc32123",
          "@type": "identity:SimpleNameFacet",
          "identity:givenName": "John",
          "identity:familyName": "Smith"
        }
      ]
    },
    {
      "@id": "kb:relationship-cecfbe8c-8357-4105-b448-b491177fedf2",
      "@type": "core:Relationship",
      "core:kindOfRelationship": "located-at",
      "core:source": "kb:person-952c09ff-5a38-483b-9dcf-6d8f0b27dfac",
      "core:target": "kb:location-7044bee0-d5d2-45f3-bb5d-2ced42bfd3f4"
    },
    {
      "@id": "kb:location-7044bee0-d5d2-45f3-bb5d-2ced42bfd3f4",
      "@type": "location:Location",
      "uco-core:hasFacet": [
        {
          "@id": "kb:69e9fe37-f2ee-435b-998f-7b1b0d60a405",
          "@type": "location:SimpleAddressFacet",
          "location:locality": "New York City",
          "location:region": "New York",
          "location:country": "USA",
          "location:street": "5th Ave"
        }
      ]
    }
  ]
}

Coordination

  • Tracking in Jira ticket OC-152 and OC-200
  • Administrative review completed
  • Requirements to be discussed in OC meeting, 2022-08-16
  • Requirements Review vote occurred, passing, on 2022-08-16
  • Requirements development phase completed.
  • Solution announced to OCs on 2022-08-24
  • Solutions Approval to be discussed in OC meeting, 2022-08-25
  • Issue 470 resolved.
  • Solutions Approval vote occurred, passing, on 2022-08-25
  • Solutions development phase completed.
  • Implementation for UCO merged into develop
  • Implementation for CASE merged into develop
  • Milestone linked
  • Documentation logged in pending UCO release page
  • Documentation logged in pending CASE release page
@ajnelson-nist
Copy link
Contributor

I believe this proposal is strategically wrong and will file two proposals correcting underlying issues.

The short is core:id and core:type must be deleted due to conflicts with core RDF.

@ajnelson-nist
Copy link
Contributor

Looking again, I now think only the parts of this proposal pertaining to core:id and core:type are wrong, on account of my belief that core:id and core:type are wrong to include in UCO at all. I am drafting those proposals still.

However, there is another piece that I think is missing from your solution suggestion. We allow sh:nodeKind sh:BlankNodeOrIRI on all of our object properties. I think this proposal is supposed to include instead using sh:nodeKind sh:IRI on most, if not all, of the object properties' shapes.

Last, I remember we had discussed this before in Jira, and I had asked you for an example and you might not have gotten a notice of the Jira comment. How would you represent a file that has a hash? I think that is going to be an essential sanity-check.

@ajnelson-nist
Copy link
Contributor

@sbarnum : Also, if the top-most class in UCO would now be core:ClassBase, we should expand the disjoint statement between core:UcoObject and core:Facet to cover the other sibling subclasses of core:ClassBase. E.g., this axiom should now be included in core::

[]
    a owl:AllDisjointClasses ;
    owl:members (
        array:ArrayOfAction
        tool:BuildConfigurationType
        # ... there are actually quite a lot ...
        core:Facet
        core:UcoObject
        # ...
    ) ;
    .

It's actually a bit of a surprise when looking at what Protege displays as subclasses of owl:Thing.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

@ajnelson-nist Good catch on changing sh:nodeKind sh:BlankNodeOrIRI to sh:nodeKind sh:IRI on ObjectProperty SHACL shapes.
I had missed that implication.

Here is an example of a file with a hash:

{
  "@id": "kb:file-a0a69ece-da9c-4256-a9a8-5dec82a4ad1f",
  "@type": "uco-observable:File",
  "uco-core:hasFacet": [
    {
      "@id": "kb:ContentDataFacet-1e54fa5e-1399-476c-8aa7-00781b8c12db"
      "@type": "uco-observable:ContentDataFacet",
      "uco-observable:hash": [
        {
          "@id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
          "@type": "uco-types:Hash",
          "uco-types:hashMethod": {
            "@type": "uco-vocabulary:HashNameVocab",
            "@value": "SHA256"
          },
          "uco-types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "e5ca3be56f66200a1bb2262e948ac08dbc672bc8033c1ada743787b0c667dea6"
          }
        }
      ]
    }
  ]
}

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I have no objections to expanding the disjoint statement to include all classes that only have owl:Thing as a superclass (i.e. add in all of the classes that are neither subclasses of UcoObject or Facet).

@ajnelson-nist
Copy link
Contributor

FYI, the observable:hash snippet has an error - the literals (@value-bearing) must not have @id.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I very fundamentally disagree with the assertion to remove core:id and core:type properties.
I have added a comment to the related CP explaining why.
All of the rationale I have seen to date for removing them is based on a presumption that JSON-LD and other RDF serializations are the only way to serialize UCO. This has not been the case since the beginning of UCO and CASE. JSON-LD is the default serialization but UCO should support any other serialization as well.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

FYI, the observable:hash snippet has an error - the literals (@value-bearing) must not have @id.

Oops. I got id happy. LOL>

I will fix it.
thanks

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I fixed the example to remove my extraneously added ids.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I updated the CP to include the changes to the ObjectProperty SHACL shapes `sh:nodeKind' and the class disjoint statement.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I realized that our JSON-LD context should contain the following:

"core:id": "@id",
"core:type": "@type",

Rather than

"id": "@id",
"type": "@type",

In this way the plain json cleanly aligns to the ontology as expected and the context does the work of mapping those properties to @id and @type.

We can also add any documentation we want to the json-ld context file outside of the "context" definition object that documents details of our json-ld serialization. The processor will simply ignore the extra content.

I am going to make the above change to the json-ld context proposal.

@ajnelson-nist
Copy link
Contributor

"core:id": "@id",
"core:type": "@type",

That breaks JSON-LD if core:id and core:type are owl:DatatypePropertys.

@ajnelson-nist
Copy link
Contributor

All of the rationale I have seen to date for removing them is based on a presumption that JSON-LD and other RDF serializations are the only way to serialize UCO. This has not been the case since the beginning of UCO and CASE. JSON-LD is the default serialization but UCO should support any other serialization as well.

In terms of what UCO has committed to developing technologically for 1.0.0, JSON-LD is in scope, and we are trying very hard for JSON that is not JSON-LD. Other non-RDF syntaxes have not been presented as specific use cases.

@ajnelson-nist
Copy link
Contributor

"core:id": "@id",
"core:type": "@type",

That breaks JSON-LD if core:id and core:type are owl:DatatypePropertys.

Further, @type must always be interpreted as rdf:type, and @id must always be interpreted as a node identifier. I don't think you appreciate that you are proposing completely breaking RDF functionality of JSON-LD with these properties.

@ajnelson-nist
Copy link
Contributor

Re:

        {
          "@id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
          "@type": "uco-types:Hash",
          "uco-types:hashMethod": {
            "@type": "uco-vocabulary:HashNameVocab",
            "@value": "SHA256"
          },
          "uco-types:hashValue": {
            "@type": "xsd:hexBinary",
            "@value": "e5ca3be56f66200a1bb2262e948ac08dbc672bc8033c1ada743787b0c667dea6"
          }
        }

This @id causes me some stomach pain as a developer. A UUID for every hash algorithm-value pair? I am aware of some systems that do indexing at potentially the level of every JSON @type-bearing (non-@value-bearing) object, so I appreciate that this might be necessary. I'd really hate to make another object that stores that same hash algorithm-value pair, though. The index load would feel pretty gross.

On the brighter side, if types:Hash objects could be shared, we might actually get query-time benefits from letting users use indexing on these types:Hash nodes' identifiers. Requiring UUIDv4s would keep UCO at its current level of being able to compute matching hash values: only by full comparison of the hash string value and method.

As a summary effect: I would like observable:hash to be protected from being a owl:InverseFunctionalProperty, perhaps with something like this update, changing the comment from:

Hash values of the data.

To:

A hash value of the data. As part of UCO OWL modeling, this property is intentionally neither an owl:FunctionalProperty, nor an owl:InverseFunctionalProperty.

May we expand the scope of this proposal to include this revision to observable:hash?

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 5, 2022

I think I may have discovered the root of our disconnect.

I just noticed that types:Identifier is currently only defined as a generic rdfs:Datatype with no further detail.
This was never the intention.
It was always intended to be a Datatype constraining the value of xsd:string with a regex for our agreed form of IRI value for an object identifier. We discussed this at length a few years back and I could have sworn we added it in to the definition of types:Identifier but it is obviously not there now. I don't know if we never finished that work or if it got put in and then pulled out at some point.
At a minimum the defined constraint on string should be a regex for an IRI. More specifically it should constrain it to the UCO identifier pattern we developed that ensured global uniqueness and simply supported linked-data. It was "-" (this is the pattern we use in examples) where the UUID was at least v4 but eventually we would like to support v5 for autogeneration based on semantically relevant content of the object (this v5 approach would handle the hash reuse issue you describe above).

I think the issue is we need to complete the definition of types:Identifier as described above. Once that is done, I believe the rest of this CP should work unless I am completely missing something.

At that point types:Identifier is a string with particular value constraints.
core:id has range of types:Identifier so is a string with particular value constraints (which ensure it is a valid IRI identifier).
core:range already has a range of xsd:string and is defined as "The explicitly-defined type of characterization of a concept."

"core:id": "@id",
"core:type": "@type",

in the json-ld context simply changes

"uco-core:id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
"uco-core:type": "uco-types:Hash",

to

"@id": "kb:hash-87c24a7f-a0d2-41a3-a726-0521a5c7bc8c",
"@type": "uco-types:Hash",

The value strings do not change at all. They are valid values of the core:id and core:type properties including core:id being range of types:Identifier. They are also valid in json-ld as the value of @id is a valid node identifier and the value of @type is a valid rdf:type string identifer.

Am I missing some other dimension to this or was the root of our disconnect the fact that types:Identifier is currently incompletely defined.

@ajnelson-nist
Copy link
Contributor

You are still not understanding that trying to use this will break JSON-LD:

"core:id": "@id",
"core:type": "@type",

Please test that.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 8, 2022

You are correct that

"core:id": "@id",
"core:type": "@type",

are invalid.
I forgot that left-side keys in the context cannot be prefixed entities.
I would not categorize this as breaking json-ld but it is definitely invalid syntax for the context and would throw errors in the json-ld processor. I changed the json-ld context CP back to the way it was.

I still have not seen any convincing argumentation/evidence that the presence of core:id and core:type "break" anything. What I have seen is an assertion that they may be confusing in regards to the bindings for these concepts to RDF which I would agree with.

The remaining challenge I see is how we express the requirements for these concepts/properties if we remove them.
It is true that for RDF serializations the required rdf:type (@type in json-ld) is implicitly linked to the class for which the object is an individual of and that the subject of RDF triples are inherently IRI identifiers. And if we modify the sh:nodeKind for all ObjectProperties in UCO to be sh:IRI then we implicitly require object IDs to be IRIs and not blank nodes.

For other serializations these requirements and linkages are not implicit and we need a way to convey them.
Further, without any explicit representation for an id property in the ontology how do we express the desired IRI formatting constraint for UCO identifiers?

While RDF/JSON-LD are the specific minimally targeted and fully supported serializations for 1.0.0 there is a significant difference between the intention to fully support other serializations and simply to not make decisions that block them. It has always been a fundamental principle of UCO that our serialization support is inclusive not exclusive. For 1.0.0 we are not going to fully flesh out serialization support beyond RDF/JSON-LD but we need to make sure we do not presume that these will be the only serializations for UCO and make design/implementation decisions that prevent other serializations from being practical.

If we can identify how we can do the following without the core:id and core:type properties then I am okay with removing them for 1.0.0:

  • assert that all objects in any serialization must have a globally unique identifier and an explicit assertion of type tied to a UCO class
  • assert the desired IRI formatting for UCO identifiers

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Aug 8, 2022

Re:

assert that all objects in any serialization must have a globally unique identifier and an explicit assertion of type tied to a UCO class

I feel this is an impossible requirement to satisfy a priori. I know of no enumeration of serialization formats broken out by whether they have an elementary structure of a node identifier or not. XML outside of RDF doesn't. YAML...I don't know.

For the targeted support serializations, which are based on RDF, core:id is a hindrance. It is a repetition of the "Subject" position of a triple. I admit RDF seems to have danced around not using the term "ID", and instead using "The subject of a triple." But in the RDF serialization, usage of core:id seems it should be actively discouraged, as it can only repeat, as a string serialization, the RDF-structural node identifier.

Re:

assert the desired IRI formatting for UCO identifiers

You can say "Desired," but it would be a complete information siloing act to say say "Required." If you require a format for node identifiers, UCO is incompatible with every application that predates UCO, where rdf:Resources serve as ideas. How would you, for instance, say that this IRI (which has a label familiar to this community) is also a UCO identity:Organization?

<http://www.wikidata.org/entity/Q2464882>
    rdfs:label "Netherlands Forensic Institute"@en .

(Edit: I'd initially copied the URL instead of the concept IRI. Now fixed here and below.)

If this next block of Turtle is invalid UCO because of that yet-unspecified types:Identifier, then UCO is an information silo and fails semantic web interoperability.

<http://www.wikidata.org/entity/Q2464882>
    a uco-identity:Organization ;
    rdfs:label "Netherlands Forensic Institute"@en .

I do not think it would be helpful for UCO to attempt prescribing any type of format for concept IRIs. I'd omitted removing the types:Identifier datatype in the core:id proposal, but if it is a more-harm-than-good concept to retain, I would also suggest deleting it.

Last, re: core:type - I believe this does not sufficiently differ from rdf:type to merit retaining. It also demonstrates, to me, a UCO willingness to invent, and re-invent, rather than adopt, which looks particularly fragmentative when what's being re-invented is a part of a specification already adopted as a foundational technology (RDF). Further, your YAML illustration made it seem likely to me other serializations of UCO would also need to support namespacing because of UCO's use of namspaces to house concepts with matching basenames (aka fragments, e.g. startTime in both core: and action:). If so, the rdf: prefix is just as available for use as UCO's several, so core:type appears, again, moot and incompatibly typed versus core RDF.

For RDF-based applications, I think this proposal's requirements on nodes bearing non-blank identifiers can be satisfied with sh:nodeKind sh:IRI being used in place of sh:nodeKind sh:BlankNodeOrIRI.

@ajnelson-nist
Copy link
Contributor

@sbarnum , something else you should be aware of: Some JSON-LD serializers are likely to make every node that has an @id key into a "Top"-level (that is, not nested) JSON object in the @graph array. So, this proposal has an additional risk, that JSON-LD examples that are programmatically generated (such as some CASE examples) may be significantly more difficult to read by eye, due to Facets being at potentially far-flung regions of the file compared to their housing UcoObject.

I'm not actually 100% sure whether there is a technical solution to this yet, or if the problem has non-standard workarounds, but there is a specification that tries to say when some objects, even with @ids, should nest in one another. That standard is JSON-LD Framing, but it is currently only an Editor's Draft.

@ajnelson-nist
Copy link
Contributor

Also, there is a slight error in some of the motivation for this proposal:

Local NodeIds are typically assigned by RDF processors following similar or identical algorithms for each set of content leading to a certainty of ID conflicts in merged content.

This is incorrect if remaining in the context of RDF processors sending data between one another. If a blank node is loaded, the RDF processor must generate a process-local identifier on reading. These two files would not cause a conflict if loaded into the same graph instance:

_:x rdfs:comment "I am node x." .
_:x rdfs:comment "I am node x." .

Yes, they are the same content to the eye, but the engine will assign a new (typically skolemized) random-ish identifier in place of _:x. The length of the total graph will be two distinct triples, not one repeated.

I believe there is next to no risk of ID conflicts when merging content. That said, there are other detractors to using blank nodes, because even when you see their name serialized like _:x in a file, you can't write code within an RDF engine to say "Describe _:x", so there is still good reason to require non-blank identifiers.

@ajnelson-nist
Copy link
Contributor

@sbarnum - while reviewing UCO's Jira backlog, I came across OC-200 that runs through a whole list of things (many in the observable namespace) that have no parent class.

Rather than enumerate those classes here, I believe the solution of this proposal needs to incorporate the following SPARQL query into CI, failing CI if there are any finds other than your proposed top-level class.

SELECT ?nClass
WHERE {
    ?nClass a owl:Class .
    FILTER NOT EXISTS {
        ?nClass rdfs:subClassOf ?nOtherClass .
    }
}

That query should be run against the monolithic build of UCO (a temporary artifact of the CI workflow under /tests), after deleting (from an in-memory copy) all triples of the form x rdfs:subClassOf owl:Thing .

@ajnelson-nist
Copy link
Contributor

Also, a style matter, more artistic opinion than technical issue:

core:ClassBase feels like a heavily object oriented programming oriented term, and awkward as a top-level class vs. core:UcoObject. May we borrow a name pattern from OWL, and call UCO's top-level class core:UcoThing instead of ClassBase?

@ajnelson-nist
Copy link
Contributor

As a further argument for core:UcoThing over core:ClassBase: Verbalizing.

"Here in my graph, I have X, a UCO types hash, which is also a UCO core class base, which is also an OWL thing."

Versus:

"Here in my graph, I have X, a UCO types hash, which is also a UCO core UCO thing, which is also an OWL thing."

@sbarnum sbarnum mentioned this issue Aug 15, 2022
11 tasks
@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 16, 2022

I agree on having a CI SPARQL check to ensure all classes have defined superclasses.

I also do not object to core:UcoThing.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 16, 2022

I state with the certainty of experience that blank nodes WILL cause integrity issues when merged into a graph store.

Unique IRI's are required for all objects.

@ajnelson-nist
Copy link
Contributor

@sbarnum : you made a few claims in yesterday's meeting, about blank node behaviors, that did not agree with my understanding of some specification---I assume RDF's---and how blank nodes behave when consumed by multiple tools. That is one of the key motivators for this proposal, and your citation chain currently stops at "[your] experience."

Part of the solution for this proposal will be implementing this query as part of a SHACL-SPARQL constraint:

SELECT ?nThing                                                                                                         
WHERE { 
        ?nThing a/rdfs:subClassOf* uco-core:UcoObject .                                                                
        FILTER (
                ! REGEX (
                        STR(?nThing),
                        "[0-9a-f]{8}-[0-9a-f]{4}-[0-5][0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$",                               
                        "i"
                )
        )
}

(That will be adapted to use uco-core:UcoThing. I gave the query above to @gwebb-case for his assistance with our examples' UUID review.)

I believe this is a pretty significantly CPU-expensive query to compute, and person-expensive query to review when a use case justifies using an IRI form that does not end with UUIDs. I would strongly prefer its usage be justified by more than "Your experience."

Can you please provide, for the understanding of users downstream who come to UCO complaining about the runtime or log-volume of this review rule:

  1. The section of the RDF or RDFS spec that you've seen tools use to collide blank nodes.
  2. If possible, a technology demonstration of some tool that collides blank node identifiers, using these two graph files:
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "I am anonymous-node x." ;
_:x <http://www.w3.org/2000/01/rdf-schema#comment> "I am ANOTHER anonymous-node x." ;

I had expected any RDF 1.1-conformant tool that loads those two files would have two independent subjects with one comment each, not one subject with two comments. I haven't seen rdflib or rdf-toolkit do this.

@ajnelson-nist ajnelson-nist added this to the UCO 1.0.0 milestone Aug 17, 2022
ajnelson-nist added a commit that referenced this issue Aug 31, 2022
This patch reverts a word change from
51f3723.

References:
* #430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Aug 31, 2022
References:
* #430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* #430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Sep 1, 2022
References:
* #430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430
* ucoProject/UCO#467

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ucoProject/ucoproject.github.io that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430
* [ONT-295] Release CASE 1.0.0

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430
* [ONT-295] Release CASE 1.0.0

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#430
* [ONT-295] Release CASE 1.0.0

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 1, 2022
References:
* ucoProject/UCO#430
* [ONT-295] Release CASE 1.0.0

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Sep 2, 2022
One potential bug has been flagged with this shape, implemented in UCO
Issue 406:
`uco-owl:ObjectProperty-shacl-constraints-shape`

The `sh:PropertyShape` raising the bug has been given an IRI in order to
link a deactivation rationale.

A new shapes file `debug.ttl` has been added to disable that shape until
a test is written to confirm the CASE-Corpora shape is correct.

`Facet`s that were blank nodes have been given IRIs, per the
implementation of UCO Issue 430.  New `sh:Info`-severity violations are
reported for some URLs treated in the "URL as an `rdfs:Resource` manner,
which will not be given UUID endings.  `case_validate` is called with
`--allow-warnings`, but is intended to be called with `--alow-infos`;
that will have to wait for `case-utils` Issue 70 to resolve.

Imports of CASE and UCO ontologies now use their `owl:versionIRI`s,
implemented in UCO Issue 437.

A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE-Utilities-Python#70
* ucoProject/UCO#406
* ucoProject/UCO#430
* ucoProject/UCO#437

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-PyPI-Exifread that referenced this issue Dec 19, 2022
This is expected to trigger a CI failure from at least usage of blank
nodes for UCO concepts, disallowed with the release of UCO 1.0.0.

References:
* ucoProject/UCO#430

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants