-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCO's Dictionary class should enforce key uniqueness #602
Comments
It seems possible to enforce this in OWL, but in a manner that does not appear (in the opinion of the proposer) to be practical because of impositions on graph-populators. Some impositions are possibly mitigated by Requirement 2. This sketch should be considered an aside and not otherwise part of this proposal. The implementation seems possible by the following, needing (1) and at least one of (2a) or (2b): (1) UCO specification: UCO adds the following definitions to types:DictionaryEntry
owl:hasKey (
[
owl:inverseOf types:entry ;
]
types:key
) ;
. Either 2a or 2b would then let an OWL reasoner conclude the data graph is inconsistent if a dictionary entry were repeated. (2a) UCO specification: UCO restricts types:DictionaryEntry
rdfs:subClassOf [
a owl:Restriction ;
owl:onProperty types:value ;
owl:cardinality "1"^^xsd:nonNegativeInteger ;
] ;
. Then if a (2b) User participation: A UCO graph-populator adds, for each The reason for (2a) and (2b) is |
A follow-on patch will regenerate Make-managed files. References: * #602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * #602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
PR #603 has been posted to illustrate the solution above that does not add |
The solution sketched so far also includes one odd-looking matter, where the SPARQL constraint is within an anonymous |
@ajnelson-nist I reviewed this issue and the associated PR. The validation proposed in your PR makes sense to me so that dictionary key/value pairs are not duplicated. But, I am not following the logic for the possible ProperDictionary and ImproperDictionary subclasses. I am having trouble thinking of a real world use case for an ImproperDictionary subclass. Might you have an example you can share? |
@vulnmaster : I see Competency 1 used an example configuration file snippet, which I'll reduce further for discussion:
Suppose it's not documented, but instead assumed, that this configuration format functions like a dictionary, because the vast majority of the time the configuration-keyword Suppose then that a tool developer writes a UCO-based implementation that routes that assumed dictionary through the
Some general use cases for
I don't think I can share a specific example.
|
The conclusion at the end of last week's call was that the |
… property A follow-on patch will regenerate Make-managed files. References: * #602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * #602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This has analagous rationale to UCO Issue 599, as well as supporting the data-sharing use case where a dictionary key repetition is wished to be shared without sharing other members of the dictionary. No effects were observed on Make-managed files. References: * #599 * #602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
After discussion from last week's call, this proposal has been updated and a new PR has been filed. Updated proposal sections (look for "Added 2024-06-04"):
New PR: Test-coverage is documented with an addition to |
There is still a question remaining on requirement 2: Should a repeated key-value pair trigger the same warning of a repeated key? My current feeling is, yes, it should. If the committee believes a key-value pairing is a "key" (as in |
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Note this tests the implementation as of UCO PR 603. 607 will be tested in a later patch. A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#603 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#602 * ucoProject/UCO#603 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#607 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Because it seemed some of the effects of this proposal were unclear going into the last OCs call, the release documentation for this proposal has also been drafted. Please see UCO website PR 82 for the current draft based on yesterday's changes. |
I think I support this adjustment to the CP though I am a bit confused by some of what I am seeing in the PASS and FAIL test cases. My understanding of what is being proposed is that:
If this is the intent, then I fully support it. Looking at the test cases it looked like some of the variations would not function as described above. Maybe my brain is just fuzzy from having a migraine most of the day? |
Yes to both. We have a chance to alter the specification with respect to the parenthetical, but if we don't, a repeated key-value pair is considered to have the same impact as a repeated key with different values.
Correct.
Almost. The Separately,
Yes, and yes.
Condolences on the lousy day. But yes, you read correctly except for my one remark in the middle. |
On today's call, I will suggest our default answer to Requirement 2 is as I noted above: A repeated key/value pair will be considered the same way as a repeated key. |
Ah. I read right past that I guess but have no issue at all with the constraint to xsd:string
|
Given the comments above confirming my interpretation of the intended approach (including around repeatsKey) I would vote in the affirmative for a Requirments Review vote. |
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#618 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 * ucoProject/UCO#618 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#602 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Background
The definition of
types:Dictionary
reads (emphasis added):UCO does not currently test this key-uniqueness within the encoding in the ontology. SHACL provides a mechanism, via SHACL-SPARQL, to encode this uniqueness constraint in the ontology.1 UCO adopted SHACL in version 0.7.0, but this definition predates version 0.7.0.
With a key-uniqueness enforcement mechanism, UCO can serve a role in detecting repeated dictionary keys in data flows. E.g., UCO can assist with detecting some instances of Common Weakness Enumeration 694 (CWE-694), "Use of Multiple Resources with Duplicate Identifier," if scoping "Resource" to "Value within a key-value store represented as a dictionary data structure." (This example is from a non-exhaustive review of the CWE dictionary.)
Requirements
Requirement 1
UCO must enforce the uniqueness of dictionary keys in
types:Dictionary
, or some subclass oftypes:Dictionary
.Requirement 2
UCO must clarify whether a dictionary that repeats a key-value pair across two or more entries is considered conformant.
Requirement 3
(Added 2024-06-04.)
UCO must support an explicit validation mechanism to validate dictionary entry-keys' uniqueness, in an opt-in manner.
Requirement 4
(Added 2024-06-04.)
UCO must support a mechanism to report when a dictionary violates the expectation of entry-key uniqueness.
Requirement 5
(Added 2024-06-04.)
UCO must support the ability to report what key in a dictionary was found to be repeated, without also requiring a disclosure of the dictionary's key-values. Note that this entails being able to share an empty dictionary, similar to Issue 599.
Requirement 6
(Added 2024-06-04.)
If a dictionary has a repeated key reported, the dictionary must be reported as a dictionary violating the entry-key uniqueness expectation.
Risk / Benefit analysis
Benefits
Adding uniqueness enforcement to
types:Dictionary
would enable UCO's SHACL validation to catch data oddities in line with thetypes:Dictionary
class's specification.Risks
It is possible UCO tooling will encounter subject data for ingest into a graph, where a purported unique-key dictionary does not have unique keys. If UCO were to implement a SHACL-SPARQL query confirming key uniqueness right now on
types:Dictionary
, this leaves a coverage gap on subject data that, by some specification, is a dictionary, but happens to not follow key uniqueness. In some contexts this can be significant information, so there may be need to add nuance to the implementation around key-uniqueness.One possible solution is specializing the UCO
Dictionary
with two subclasses:ProperDictionary
andImproperDictionary
, borrowing "Proper" from "proper subset" (subset where there exists a non-member of the subset in the superset) and "proper interval" (interval of non-0 length).types:Dictionary
not further subclassed would be left to the UCO consumer to ultimately test.Adding
ProperDictionary
andImproperDictionary
would carry a further risk, that existing child classes ofDictionary
would not necessarily automatically acquire proper-ness or improper-ness. Multi-typing would need to be used instead, e.g.:Last, it might not always be possible to check with SHACL validation that something asserted to be a
types:ImproperDictionary
has a repeated key. In partial-data-sharing scenarios, other SHACL constraints in thetypes:
namespace would require disclosure of the entireDictionaryEntry
object for at least two of the key-repetitions, and this might not be universally desirable.A new
owl:DatatypeProperty
types:repeatsKey
ontypes:ImproperDictionary
might assist with partial-data sharing issues.(Added 2024-06-04.)
The proposed
types:repeatsKey
carries an implication that it is being used on a a dictionary that is atypes:ImproperDictionary
. If used,types:ImproperDictionary
should be entailed for the sake of class-based data review mechanisms (i.e., searches in graphs fortypes:ImproperDictionary
).types:repeatsKey
should be added with an RDFS domain declaration oftypes:ImproperDictionary
, and this domain assertion should be included in SHACL validation. (Practices to do this are already enacted in the UCO OWL review shapes.)Competencies demonstrated
Competency 1
A configuration file, deliberately non-conformant to its specification that it provide unique keys, is fed through a content-posting ecosystem, where a security tool tests resources with a "last-read-wins" key-value parser, and the ecosystem's consumers primarily use a consumer tool with a "first-read-wins" key-value parser:
If transcribed into UCO (1.3.0) without checking for key-uniqueness, this would be the resulting graph:
For UCO consumers that are JSON-based, and not JSON-LD-based, the
DictionaryEntry
order from (pseudo-)random UUIDs could affect downstream results.Competency Question 1.1
If a tool reads this into a
types:Dictionary
, what would happen against the current UCO specification (1.3.0)?Result 1.1
Per the definition in
types:Dictionary
, this SHOULD raise some kind of data validation error, but the responsible tester is not designated.Competency Question 1.2
How would an ingest-to-UCO process represent that the source file had a repeated key, in the UCO graph?
Result 1.2
There is not currently a specification on how to handle this, or whether it would be appropriate to store, say, only the "proper"
Dictionary
keys.If the
ProperDictionary
/ImproperDictionary
strategy is selected for implementation, graph-populating programs could start creatingDictionary
objects and specialize them after parsing source-data intoProperDictionary
orImproperDictionary
as appropriate.If the
types:repeatsKey
property is accepted, that property could be used with theDictionary
(/ImproperDictionary
) object to record a part of the malformed data more likely to be desired to share.(Added 2024-06-04.)
The
types:repeatsKey
property should only be used ontypes:ImproperDictionary
.Solution suggestion
This SHACL-SPARQL constraint would test for repeated instances of keys.
Note: This would also reject a dictionary where a key-value pair is repeated. Requirement 2 will inform whether this should be adjusted.
Where this constraint is attached depends on whether the
ProperDictionary
/ImproperDictionary
subclasses strategy is adopted. "Backwards-compatibility" below means data that raise no SHACLsh:Violation
-severity results today would only, at worst, raisesh:Warning
-severity results until UCO 2.0.0.types:Dictionary
.sh:Warning
-severity validation results for UCO < 2.0.0,sh:Violation
-level for UCO 2.0.0.types:Dictionary
would not change.types:ProperDictionary
.types:Dictionary
would need to change to suggest use of the subclasses in order to confirm conformance with key-uniqueness.types:Dictionary
with ash:Warning
severity, in order to alert about data that is contrary to the definition text's set expectations.(Added 2024-06-04.)
After discussion from the 2024-05-30 meeting, the new, disjoint dictionary subclasses and the
repeatsKey
were implemented in a PR superseding the original PR.On addition of the dictionary subclasses, the SPARQL constraint checking for repeated keys in plain
types:Dictionary
s ended up appearing to be more permanent than originally anticipated. There is no reason based on backwards-compatibility to remove the shape that merely warns of repeated keys. Unfortunately, there is not a way to specify in SHACL that the constraint should only run ontypes:Dictionary
s that are not alsotypes:ProperDictionary
s (which runs its own SHACL constraint of higher severity), unless OWL entailment is set as an operational requirement. General entailment requirements on users is purposefully left out of scope of this proposal, save for one special-purpose detail onrepeatsKey
.It is fair to discuss whether UCO should always review all dictionaries for key uniqueness. The shape performing this review is given its own IRI, so it is possible for users to use
sh:deactivated
to deactivate the shape when their operations are otherwise prepared to address key repetitions (such as through some process that always assigns the proper or improper dictionary type).repeatsKey
induced Requirement 5, enabling "empty" dictionaries for partial data-sharing scenarios. It also induced a data safety review mechanism, adding anrdfs:domain
declaration with an accompanying test that the domain is satisfied, whether through explicit typing (i.e. hard-coded assignment oftypes:ImproperDictionary
), or through entailment (whether RDFS entailment or OWL entailment).This is a deviation from UCO generally avoiding usage of
rdfs:domain
.repeatsKey
is offered as a property with sufficient gravity that its presence should ensure review mechanisms for handling improper dictionaries are triggered.Coordination
develop
for the next releasedevelop
state with backwards-compatible implementation merged intodevelop-2.0.0
develop-2.0.0
(N/A)develop
branch updated to track UCO's updateddevelop
branchdevelop-2.0.0
branch updated to track UCO's updateddevelop-2.0.0
branchFootnotes
There does not appear to be a mechanism to do this with SHACL property shapes or constraint components.
sh:disjoint
comes close, but instead separates predicates in triples, not objects. ↩The text was updated successfully, but these errors were encountered: