Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCO's Dictionary class should enforce key uniqueness #602

Closed
17 tasks done
ajnelson-nist opened this issue May 7, 2024 · 14 comments · Fixed by #603, #607 or #618
Closed
17 tasks done

UCO's Dictionary class should enforce key uniqueness #602

ajnelson-nist opened this issue May 7, 2024 · 14 comments · Fixed by #603, #607 or #618

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented May 7, 2024

Background

The definition of types:Dictionary reads (emphasis added):

A dictionary is list of (term/key, value) pairs with each term/key existing no more than once.

UCO does not currently test this key-uniqueness within the encoding in the ontology. SHACL provides a mechanism, via SHACL-SPARQL, to encode this uniqueness constraint in the ontology.1 UCO adopted SHACL in version 0.7.0, but this definition predates version 0.7.0.

With a key-uniqueness enforcement mechanism, UCO can serve a role in detecting repeated dictionary keys in data flows. E.g., UCO can assist with detecting some instances of Common Weakness Enumeration 694 (CWE-694), "Use of Multiple Resources with Duplicate Identifier," if scoping "Resource" to "Value within a key-value store represented as a dictionary data structure." (This example is from a non-exhaustive review of the CWE dictionary.)

Requirements

Requirement 1

UCO must enforce the uniqueness of dictionary keys in types:Dictionary, or some subclass of types:Dictionary.

Requirement 2

UCO must clarify whether a dictionary that repeats a key-value pair across two or more entries is considered conformant.

Requirement 3

(Added 2024-06-04.)

UCO must support an explicit validation mechanism to validate dictionary entry-keys' uniqueness, in an opt-in manner.

Requirement 4

(Added 2024-06-04.)

UCO must support a mechanism to report when a dictionary violates the expectation of entry-key uniqueness.

Requirement 5

(Added 2024-06-04.)

UCO must support the ability to report what key in a dictionary was found to be repeated, without also requiring a disclosure of the dictionary's key-values. Note that this entails being able to share an empty dictionary, similar to Issue 599.

Requirement 6

(Added 2024-06-04.)

If a dictionary has a repeated key reported, the dictionary must be reported as a dictionary violating the entry-key uniqueness expectation.

Risk / Benefit analysis

Benefits

Adding uniqueness enforcement to types:Dictionary would enable UCO's SHACL validation to catch data oddities in line with the types:Dictionary class's specification.

Risks

It is possible UCO tooling will encounter subject data for ingest into a graph, where a purported unique-key dictionary does not have unique keys. If UCO were to implement a SHACL-SPARQL query confirming key uniqueness right now on types:Dictionary, this leaves a coverage gap on subject data that, by some specification, is a dictionary, but happens to not follow key uniqueness. In some contexts this can be significant information, so there may be need to add nuance to the implementation around key-uniqueness.

One possible solution is specializing the UCO Dictionary with two subclasses: ProperDictionary and ImproperDictionary, borrowing "Proper" from "proper subset" (subset where there exists a non-member of the subset in the superset) and "proper interval" (interval of non-0 length).

  • A proper dictionary would be known to, and/or required to, have unique keys;
  • an improper dictionary would be known to have some repeated key;
  • a types:Dictionary not further subclassed would be left to the UCO consumer to ultimately test.

Adding ProperDictionary and ImproperDictionary would carry a further risk, that existing child classes of Dictionary would not necessarily automatically acquire proper-ness or improper-ness. Multi-typing would need to be used instead, e.g.:

{
    "@context": {
        "kb": "http://example.org/kb/",
        "types": "https://ontology.unifiedcyberontology.org/uco/types/"
    },
    "@id": "kb:controlled-dictionary-1",
    "@type": [
        "types:ControlledDictionary",
        "types:ProperDictionary"
    ]
}

Last, it might not always be possible to check with SHACL validation that something asserted to be a types:ImproperDictionary has a repeated key. In partial-data-sharing scenarios, other SHACL constraints in the types: namespace would require disclosure of the entire DictionaryEntry object for at least two of the key-repetitions, and this might not be universally desirable.
A new owl:DatatypeProperty types:repeatsKey on types:ImproperDictionary might assist with partial-data sharing issues.

(Added 2024-06-04.)

The proposed types:repeatsKey carries an implication that it is being used on a a dictionary that is a types:ImproperDictionary. If used, types:ImproperDictionary should be entailed for the sake of class-based data review mechanisms (i.e., searches in graphs for types:ImproperDictionary). types:repeatsKey should be added with an RDFS domain declaration of types:ImproperDictionary, and this domain assertion should be included in SHACL validation. (Practices to do this are already enacted in the UCO OWL review shapes.)

Competencies demonstrated

Competency 1

A configuration file, deliberately non-conformant to its specification that it provide unique keys, is fed through a content-posting ecosystem, where a security tool tests resources with a "last-read-wins" key-value parser, and the ecosystem's consumers primarily use a consumer tool with a "first-read-wins" key-value parser:

# ...
resource_name: supply_chain_file_1234
retrieval_url: http://example.org/file-1.dat
retrieval_url: http://example.org/file-2.dat
# ...

If transcribed into UCO (1.3.0) without checking for key-uniqueness, this would be the resulting graph:

{
    "@context": {
        "kb": "http://example.org/kb/",
        "types": "https://ontology.unifiedcyberontology.org/uco/types/"
    },
    "@id": "kb:Dictionary-7b9a4526-8a61-4d3d-a83f-f188f2a1e3e9",
    "@type": "types:Dictionary",
    "types:entry": [
        {
            "@id": "kb:DictionaryEntry-274fb580-b752-4da9-817b-03297c08b969",
            "@type": "types:DictionaryEntry",
            "types:key": "resource_name",
            "types:value": "supply_chain_file_1234"
        },
        {
            "@id": "kb:DictionaryEntry-c9ebf792-f3a8-4015-b893-da01b73c5184",
            "@type": "types:DictionaryEntry",
            "types:key": "retrieval_url",
            "types:value": "http://example.org/file-2.dat"
        },
        {
            "@id": "kb:DictionaryEntry-e40d21a0-e34c-41fb-becb-3a3a70831749",
            "@type": "types:DictionaryEntry",
            "types:key": "retrieval_url",
            "types:value": "http://example.org/file-1.dat"
        }
    ]
}

For UCO consumers that are JSON-based, and not JSON-LD-based, the DictionaryEntry order from (pseudo-)random UUIDs could affect downstream results.

Competency Question 1.1

If a tool reads this into a types:Dictionary, what would happen against the current UCO specification (1.3.0)?

Result 1.1

Per the definition in types:Dictionary, this SHOULD raise some kind of data validation error, but the responsible tester is not designated.

Competency Question 1.2

How would an ingest-to-UCO process represent that the source file had a repeated key, in the UCO graph?

Result 1.2

There is not currently a specification on how to handle this, or whether it would be appropriate to store, say, only the "proper" Dictionary keys.

If the ProperDictionary / ImproperDictionary strategy is selected for implementation, graph-populating programs could start creating Dictionary objects and specialize them after parsing source-data into ProperDictionary or ImproperDictionary as appropriate.

If the types:repeatsKey property is accepted, that property could be used with the Dictionary (/ ImproperDictionary) object to record a part of the malformed data more likely to be desired to share.

(Added 2024-06-04.)

The types:repeatsKey property should only be used on types:ImproperDictionary.

Solution suggestion

This SHACL-SPARQL constraint would test for repeated instances of keys.

[]
	a sh:SPARQLConstraint ;
	sh:message "A key in a dictionary can appear no more than once."@en ;
	sh:select """
		PREFIX types: <https://ontology.unifiedcyberontology.org/uco/types/>
		SELECT $this ?value
		WHERE {
			$this
				types:entry/types:key ?value ;
				.
		}
		GROUP BY ?value
		HAVING (COUNT(?value) > 1)
	""" ;
	.

Note: This would also reject a dictionary where a key-value pair is repeated. Requirement 2 will inform whether this should be adjusted.

Where this constraint is attached depends on whether the ProperDictionary / ImproperDictionary subclasses strategy is adopted. "Backwards-compatibility" below means data that raise no SHACL sh:Violation-severity results today would only, at worst, raise sh:Warning-severity results until UCO 2.0.0.

  • If not adopting the new subclasses: For conformance checking of UCO's current definition, the SPARQL constraint would be added to types:Dictionary.
    • For backwards-compatibility matters, the constraint would raise sh:Warning-severity validation results for UCO < 2.0.0, sh:Violation-level for UCO 2.0.0.
    • The (English) definition text for types:Dictionary would not change.
  • If the new subclasses are adopted, the constraint would go onto types:ProperDictionary.
    • The (English) definition text of types:Dictionary would need to change to suggest use of the subclasses in order to confirm conformance with key-uniqueness.
    • Again for backwards-compatibility matters, some thought needs to be given on whether the SPARQL constraint should be repeated in types:Dictionary with a sh:Warning severity, in order to alert about data that is contrary to the definition text's set expectations.

(Added 2024-06-04.)

After discussion from the 2024-05-30 meeting, the new, disjoint dictionary subclasses and the repeatsKey were implemented in a PR superseding the original PR.

On addition of the dictionary subclasses, the SPARQL constraint checking for repeated keys in plain types:Dictionarys ended up appearing to be more permanent than originally anticipated. There is no reason based on backwards-compatibility to remove the shape that merely warns of repeated keys. Unfortunately, there is not a way to specify in SHACL that the constraint should only run on types:Dictionarys that are not also types:ProperDictionarys (which runs its own SHACL constraint of higher severity), unless OWL entailment is set as an operational requirement. General entailment requirements on users is purposefully left out of scope of this proposal, save for one special-purpose detail on repeatsKey.

It is fair to discuss whether UCO should always review all dictionaries for key uniqueness. The shape performing this review is given its own IRI, so it is possible for users to use sh:deactivated to deactivate the shape when their operations are otherwise prepared to address key repetitions (such as through some process that always assigns the proper or improper dictionary type).

repeatsKey induced Requirement 5, enabling "empty" dictionaries for partial data-sharing scenarios. It also induced a data safety review mechanism, adding an rdfs:domain declaration with an accompanying test that the domain is satisfied, whether through explicit typing (i.e. hard-coded assignment of types:ImproperDictionary), or through entailment (whether RDFS entailment or OWL entailment).

types:repeatsKey
	a owl:DatatypeProperty ;
	rdfs:label "repeatsKey"@en ;
	rdfs:comment "A key found to be repeated in multiple dictionary entries within one dictionary."@en ;
	rdfs:domain types:ImproperDictionary ;
	rdfs:range xsd:string ;
	.

types:repeatsKey-subjects-shape
	a sh:NodeShape ;
	sh:class types:ImproperDictionary ;
	sh:targetSubjectsOf types:repeatsKey ;
	.

This is a deviation from UCO generally avoiding usage of rdfs:domain. repeatsKey is offered as a property with sufficient gravity that its presence should ensure review mechanisms for handling improper dictionaries are triggered.

Coordination

  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2024-05-07
  • Requirements to be discussed in OC meeting, 2024-05-30
  • Requirements update announced to OCs on 2024-06-04
  • Requirements to be discussed in OC meeting, 2024-06-25
  • Requirements Review vote occurred, passing, on 2024-06-25
  • Requirements development phase completed.
  • Solution announced to OCs on 2024-06-27
  • Solutions Approval to be discussed in OC meeting, 2024-07-16
  • Solutions Approval vote occurred, passing, on 2024-07-16
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (N/A)
  • Milestone linked
  • Documentation logged in pending release page
  • Prerelease publication: CASE develop branch updated to track UCO's updated develop branch
  • Prerelease publication: CASE develop-2.0.0 branch updated to track UCO's updated develop-2.0.0 branch

Footnotes

  1. There does not appear to be a mechanism to do this with SHACL property shapes or constraint components. sh:disjoint comes close, but instead separates predicates in triples, not objects.

@ajnelson-nist
Copy link
Contributor Author

It seems possible to enforce this in OWL, but in a manner that does not appear (in the opinion of the proposer) to be practical because of impositions on graph-populators. Some impositions are possibly mitigated by Requirement 2. This sketch should be considered an aside and not otherwise part of this proposal.

The implementation seems possible by the following, needing (1) and at least one of (2a) or (2b):

(1) UCO specification: UCO adds the following definitions to types.ttl. This uses owl:hasKey to uniquely identify any types:DictionaryEntry, with the (OWL-)key components being the entry's (dictionary-)key and the dictionary in which the entry inheres.

types:DictionaryEntry
	owl:hasKey (
		[
			owl:inverseOf types:entry ;
		]
		types:key
	) ;
	.

Either 2a or 2b would then let an OWL reasoner conclude the data graph is inconsistent if a dictionary entry were repeated.

(2a) UCO specification: UCO restricts types:value to having exactly one value. (This is consistent with the current SHACL specification, and I think could be added today independent of this whole Issue.)

types:DictionaryEntry
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty types:value ;
		owl:cardinality "1"^^xsd:nonNegativeInteger ;
	] ;
	.

Then if a types:Dictionary repeats a key between two types:DictionaryEntrys with different types:values, the data graph would be flagged inconsistent by an OWL reasoner. However, a repeated key-value pair would be accepted.

(2b) User participation: A UCO graph-populator adds, for each uco-types:Dictionary in the data graph, a x owl:differentFrom y triple for all (x, y) pairs of DictionaryEntrys associated with the dictionary.

The reason for (2a) and (2b) is owl:hasKey's role is in identity resolution, and would induce SameIndividual conclusions when the key-components match - that is, two types:DictionaryEntry nodes would be considered the same graph-individual if they were on the same dictionary ([ owl:inverseOf types:entry ]) and had the same key (types:key). The different-individuals assertions (owl:differentFrom) of (2b) would let an OWL reasoner declare inconsistencies even for repeated key-value pairs. The open-world nature of OWL leaves asserting all pairs of individuals are different to the data graph populator.

@ajnelson-nist ajnelson-nist added this to the UCO 1.4.0 milestone May 7, 2024
ajnelson-nist added a commit that referenced this issue May 7, 2024
A follow-on patch will regenerate Make-managed files.

References:
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue May 7, 2024
References:
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request May 7, 2024 that will close this issue
13 tasks
@ajnelson-nist
Copy link
Contributor Author

PR #603 has been posted to illustrate the solution above that does not add ProperDictionary + ImproperDictionary. Feedback is welcome on whether the ProperDictionary + ImproperDictionary strategy should be adopted instead.

@ajnelson-nist
Copy link
Contributor Author

The solution sketched so far also includes one odd-looking matter, where the SPARQL constraint is within an anonymous sh:NodeShape. This is due to shape severity scoping - sh:severity does not work within a sh:SPARQLConstraint. If the constraint fails (i.e., the sh:select finds a match), then the entire attached shape fails. The anonymous node shape is linked by rdfs:seeAlso so the shape will render on the generated ontology documentation page for types:Dictionary.

ajnelson-nist added a commit that referenced this issue May 8, 2024
… class

This applies a practice being tried in Issue 602.

A follow-on patch will regenerate Make-managed files.

References:
* #596
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@vulnmaster
Copy link

@ajnelson-nist I reviewed this issue and the associated PR. The validation proposed in your PR makes sense to me so that dictionary key/value pairs are not duplicated. But, I am not following the logic for the possible ProperDictionary and ImproperDictionary subclasses. I am having trouble thinking of a real world use case for an ImproperDictionary subclass. Might you have an example you can share?

@ajnelson-nist
Copy link
Contributor Author

@vulnmaster : I see ProperDictionary and ImproperDictionary as defensive coding mechanisms and assumption-checkers.

Competency 1 used an example configuration file snippet, which I'll reduce further for discussion:

# ...
resource_name: supply_chain_file_1234
retrieval_url: http://example.org/file-1.dat
# ...

Suppose it's not documented, but instead assumed, that this configuration format functions like a dictionary, because the vast majority of the time the configuration-keyword retrieval_url only appears once in files of this format.

Suppose then that a tool developer writes a UCO-based implementation that routes that assumed dictionary through the types:ProperDictionary class that I suggested, but not by using a dictionary object in their programming language or checking for key uniqueness---instead, by turning each key-value line into a types:DictionaryEntry, dumping the in-memory object to the graph, and moving on. SHACL validation would then catch a repetition of any repeated key.

types:ProperDictionary could be used as a safer-programming mechanism, as well as a transmittable guarantee that UCO's SHACL validation anywhere between graph generation and graph consumption would catch a repeated key.

types:ProperDictionary and types:ImproperDictionary could also be used as a late-bound designation in parser programming. As a programming pattern, a graph object can be instantiated with @type types:Dictionary, populated with a running check on key-uniqueness, and then the @type-set of the object can be augmented or simplified to types:ProperDictionary or types:ImproperDictionary as appropriate.

Some general use cases for types:ImproperDictionary are:

  • Detecting programming errors with multi-key assignment in graph dictionaries
  • Detecting specification errors in input data
    • Further, detecting attempted exploits of non-deterministic parsing, like differential behaviors on first-read-wins vs. last-read-wins parsers

I don't think I can share a specific example.

types:ImproperDictionary can house the proposed property types:repeatsKey, I think a bit more cleanly than types:Dictionary by itself would if proper&improper dictionaries weren't also adopted.

@ajnelson-nist
Copy link
Contributor Author

The conclusion at the end of last week's call was that the ProperDictionary and ImproperDictionary classes should be demonstrated in a PR. I'll add that soon and update the requirements in this proposal.

ajnelson-nist added a commit that referenced this issue Jun 4, 2024
… property

A follow-on patch will regenerate Make-managed files.

References:
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Jun 4, 2024
References:
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Jun 4, 2024
This has analagous rationale to UCO Issue 599, as well as supporting the
data-sharing use case where a dictionary key repetition is wished to be
shared without sharing other members of the dictionary.

No effects were observed on Make-managed files.

References:
* #599
* #602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

After discussion from last week's call, this proposal has been updated and a new PR has been filed.

Updated proposal sections (look for "Added 2024-06-04"):

  • Added requirements 3--6
  • Risks
  • Result 1.2
  • Solution suggestion

New PR:

Test-coverage is documented with an addition to tests/examples/README.md.

@ajnelson-nist
Copy link
Contributor Author

There is still a question remaining on requirement 2: Should a repeated key-value pair trigger the same warning of a repeated key?

My current feeling is, yes, it should. If the committee believes a key-value pairing is a "key" (as in owl:hasKey), there are some significantly larger questions about owl:sameAs entailment that we would need to address UCO-wide. At the moment, I think the operational benefit of this proposal is in catching instances where a repeated key-value pair is inserted into a types:Dictionary, because this is more likely to review a data or programming quirk than matters of identity resolution in the semantic web.

ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jun 5, 2024
Note this tests the implementation as of UCO PR 603.  607 will be tested
in a later patch.

A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#603

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jun 5, 2024
References:
* ucoProject/UCO#602
* ucoProject/UCO#603

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jun 5, 2024
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jun 5, 2024
References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Jun 5, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#607

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ucoProject/ucoproject.github.io that referenced this issue Jun 5, 2024
References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

Because it seemed some of the effects of this proposal were unclear going into the last OCs call, the release documentation for this proposal has also been drafted. Please see UCO website PR 82 for the current draft based on yesterday's changes.

@sbarnum
Copy link
Contributor

sbarnum commented Jun 25, 2024

I think I support this adjustment to the CP though I am a bit confused by some of what I am seeing in the PASS and FAIL test cases.

My understanding of what is being proposed is that:

  • if a data graph producer explicitly asserts a dictionary as ProperDictionary then the validator would throw an error if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value).
  • if a data graph producer explicitly asserts a dictionary as ImproperDictionary then the validator would throw no errors or warnings for any duplicated keys or key/values but would check that repeatsKey only appears on ImproperDictionary objects?
  • if a data graph producer does not explictly assert a dictionary as either ProperDictionary or ImproperDictionary but rather simply asserts it as Dictionary that the validator would throw warnings if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value)

If this is the intent, then I fully support it.

Looking at the test cases it looked like some of the variations would not function as described above. Maybe my brain is just fuzzy from having a migraine most of the day?

@ajnelson-nist
Copy link
Contributor Author

I think I support this adjustment to the CP though I am a bit confused by some of what I am seeing in the PASS and FAIL test cases.

My understanding of what is being proposed is that:

  • if a data graph producer explicitly asserts a dictionary as ProperDictionary then the validator would throw an error if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value).

Yes to both. We have a chance to alter the specification with respect to the parenthetical, but if we don't, a repeated key-value pair is considered to have the same impact as a repeated key with different values.

  • if a data graph producer explicitly asserts a dictionary as ImproperDictionary then the validator would throw no errors or warnings for any duplicated keys or key/values [...]

Correct.

[...] but would check that repeatsKey only appears on ImproperDictionary objects?

Almost.

The types:ImproperDictionary class+shape has a property shape to constrain usage of repeatsKey (that it's only used with string-literal values typed xsd:string, which is consistent with all current DictionaryEntry subclasses).

Separately, types:repeatsKey-subjects-shape is a shape that independently confirms the only subjects using that predicate are types:ImproperDictionary, and an error is raised if it is used on anything else (such as a types:Dictionary that is not also a types:ImproperDictionary: or, a observable:File; etc.)

  • if a data graph producer does not explictly assert a dictionary as either ProperDictionary or ImproperDictionary but rather simply asserts it as Dictionary that the validator would throw warnings if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value)

Yes, and yes.

If this is the intent, then I fully support it.

Looking at the test cases it looked like some of the variations would not function as described above. Maybe my brain is just fuzzy from having a migraine most of the day?

Condolences on the lousy day. But yes, you read correctly except for my one remark in the middle.

@ajnelson-nist
Copy link
Contributor Author

There is still a question remaining on requirement 2: Should a repeated key-value pair trigger the same warning of a repeated key?

My current feeling is, yes, it should. If the committee believes a key-value pairing is a "key" (as in owl:hasKey), there are some significantly larger questions about owl:sameAs entailment that we would need to address UCO-wide. At the moment, I think the operational benefit of this proposal is in catching instances where a repeated key-value pair is inserted into a types:Dictionary, because this is more likely to review a data or programming quirk than matters of identity resolution in the semantic web.

On today's call, I will suggest our default answer to Requirement 2 is as I noted above: A repeated key/value pair will be considered the same way as a repeated key.

@sbarnum
Copy link
Contributor

sbarnum commented Jun 25, 2024

I think I support this adjustment to the CP though I am a bit confused by some of what I am seeing in the PASS and FAIL test cases.
My understanding of what is being proposed is that:

  • if a data graph producer explicitly asserts a dictionary as ProperDictionary then the validator would throw an error if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value).

Yes to both. We have a chance to alter the specification with respect to the parenthetical, but if we don't, a repeated key-value pair is considered to have the same impact as a repeated key with different values.

  • if a data graph producer explicitly asserts a dictionary as ImproperDictionary then the validator would throw no errors or warnings for any duplicated keys or key/values [...]

Correct.

[...] but would check that repeatsKey only appears on ImproperDictionary objects?

Almost.

The types:ImproperDictionary class+shape has a property shape to constrain usage of repeatsKey (that it's only used with string-literal values typed xsd:string, which is consistent with all current DictionaryEntry subclasses).

Ah. I read right past that I guess but have no issue at all with the constraint to xsd:string

Separately, types:repeatsKey-subjects-shape is a shape that independently confirms the only subjects using that predicate are types:ImproperDictionary, and an error is raised if it is used on anything else (such as a types:Dictionary that is not also a types:ImproperDictionary: or, a observable:File; etc.)

  • if a data graph producer does not explictly assert a dictionary as either ProperDictionary or ImproperDictionary but rather simply asserts it as Dictionary that the validator would throw warnings if any key is defined more than once with different values (and I think also if any key is defined more than once even with the same value)

Yes, and yes.

If this is the intent, then I fully support it.
Looking at the test cases it looked like some of the variations would not function as described above. Maybe my brain is just fuzzy from having a migraine most of the day?

Condolences on the lousy day. But yes, you read correctly except for my one remark in the middle.

@sbarnum
Copy link
Contributor

sbarnum commented Jun 25, 2024

Given the comments above confirming my interpretation of the intended approach (including around repeatsKey) I would vote in the affirmative for a Requirments Review vote.

ajnelson-nist added a commit to casework/CASE that referenced this issue Jul 26, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist linked a pull request Jul 26, 2024 that will close this issue
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jul 26, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Jul 26, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Jul 29, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jul 29, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jul 29, 2024
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jul 29, 2024
References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Archive that referenced this issue Aug 6, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#618

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Aug 6, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602
* ucoProject/UCO#618

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Aug 7, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Aug 7, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Aug 9, 2024
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Aug 9, 2024
References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Aug 9, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Aug 9, 2024
No effects were observed on Make-managed files.

References:
* ucoProject/UCO#602

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment