-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator should allow extensions to data_types which only define data_type_inc #585
Comments
I think additions (i.e., new attributes, datasets, etc.) and restrictions (i.e., defining a more specific datatype or shape for a dataset that are compliant with the parent spec) should be allowed. Changes to existing fields that violate the parent schema should not be allowed, however. For the validator, I think it could be useful if this check would be a configurable behavior (e.g., |
Thinking further about this, it seems like this would essentially allow for diamond inheritance. Example: imagine groups A and B where B inherits from A. Then you have another group Foo which has a child-group A where it adds an attribute. What validation should be done if a Foo builder has a child group of type B? Example spec: - data_type_def: A
doc: group of type A
- data_type_def: B
data_type_inc: A
doc: group of type B
attributes:
- name: attr_b
dtype: text
doc: attribute of type B
- data_type_def: Foo
doc: an example group
groups:
- data_type_inc: A
doc: an extended A group
attributes:
- name: attr_a_ext
dtype: text
doc: attribute of type A Now we have a GroupBuilder of type Foo which has a child-GroupBuilder of type B. What attributes should be expected on B? It seems to me that the most consistent expectation is that it should have both What should happen now if B and the extended A define an attribute named |
Makes sense. Is this not the case when
Ok, I'll keep that in mind as I look for an approach. |
I believe the same restrictions apply with |
I'm not sure that this example shows diamond inheritance, but I agree that this is a tricky case in that you modify a type but still want to allow other subtypes with the same modifications. I don't think this case occurs in any current schema so I think it should be fine to not allow subtypes of A to be used if A is being modified. This behavior I believe is identical to what would happen if you declared a new data_type_def C in Foo (instead of just modifying A), because once you create a subtype C of A you could no longer use B inside Foo because it does not inherit from the new type. Ideally, we would not allow these kind of modifications without declaring a new type, but unfortunately that would break a number of existing parts of NWB. |
@oruebel thanks for the discussion. I'm diving into this deeper, and am uncovering even more questions. I'll lay them out below. Schema vs builder validatorDo you agree that the builder validator should take whatever spec it is given to be the source of truth, that any problems with the spec should be caught by a schema validator, and that if there is a problem with the spec that the builder validator behavior is undefined? eg. if dataset of type A has a dtype of int, and dataset of type B which inherits from A has a dtype of text - this should be a shema validation error and we don't need to specify behavior for the builder validator Does a schema validator already exist? Validation of inherited requirementsIt looks like the current validator doesn't validate against inherited requirements. e.g. I would expect that the following should return a validation error because attribute from hdmf.build import GroupBuilder
from hdmf.spec import GroupSpec, SpecCatalog, SpecNamespace, AttributeSpec
from hdmf.validate import ValidatorMap
spec_catalog = SpecCatalog()
g1_spec = GroupSpec(doc='g1', data_type_def='G1',
attributes=[AttributeSpec(name='foo', doc='an attribute', dtype='text')])
g2_spec = GroupSpec(doc='g2', data_type_def='G2', data_type_inc='G1')
spec_catalog.register_spec(g1_spec, 'test.yaml')
spec_catalog.register_spec(g2_spec, 'test.yaml')
namespace = SpecNamespace('a test namespace', 'test_core', [{'source': 'test.yaml'}],
version='0.1.0', catalog=spec_catalog)
vmap = ValidatorMap(namespace)
builder = GroupBuilder('bar', attributes={'data_type': 'G2'})
result = vmap.validate(builder)
print(result) But no error is returned. Is this expected behavior? Two approachesI've been thinking about two approaches to resolving this issue:
At first, I was thinking of (1) (and that's where I think the diamond inheritance question arises), but now I think (2) would be less complex. (2) is nice because it:
(2) could also be extended to resolve the issue with validating inherited requirements by just validating against the entire inheritance tree. So in that case the builder would be validated against both G2 and G1. Furthermore, this could also handle the above discussion on situations where you provide a child data type in a location where the spec extends the parent (eg. in the example above, the data type would be validated against the spec for A, B, and the extension for A. It would be impossible to satisfy the spec with a data type of type B, but we wouldn't have to create a special rule about providing a child data type in place of a parent with extended rules) Downsides with (2) are:
(1) would require defining and programming rules for merging specs, but then allow each data type to be validated against just 1 spec. Do you have any additional thoughts on the approach and tradeoffs? |
Yeah, no worries, I understand why we need to make it work here. :-) Is there any possibility of explicitly forbidding extension of data types without defining a new data type in the next major version of the hdmf-schema-language? I understand that would take extra work to update the existing specs to be in compliance (and new major release versions for those), but it also seems like untyped extensions create a larger surface area for bugs and for users to do unexpected things. As a recent new spec creator, I can say that even after doing my best to read the documentation, it wasn't always clear what was the right way to go about defining a spec, and I often used the existing specs from hdmf-common-schema and nwb-schema as examples. So even if we need to support old spec versions in parts of hdmf, it might be good in the long-term to prevent future specs from being able to extend without defining a type. |
Only in the sense of that there is a JSON-schema that you can use to validate and that the schema can be read by the spec reader. That however only means compliance with the schema language itself but does not validate inheritance rules. Having a true spec validator that also checks for compliance with a number of additional rules would be very useful to have, e.g., to check:
This sounds like a bug. @rly thoughts?
I have to admit, this is a non-trivial issue. I'm not sure what the best solution for this is right now.
I agree, that would be useful. One way to get (at least partially there) is to create a SpecValidator that we can set in the constructor of the spec classes (e.g,. GroupSpec) to optionally validate the spec on init and each time it is being modified.
The problem is less with the next major version of the language as it is with breaking NWB. I'm not against the idea, but I'm not sure right now how to do this without causing chaos. There are a number of complicated issues here, and I think it may be best to plan for chat at some point to discuss options via videochat. |
* Fix hdmf-dev#585 * Ref hdmf-dev#542 * Builder can now be validated against more than one spec in order to validate against additional fields added to inner data types * Also make validation Errors comparable as a way to remove duplicates that can sometimes be generated
* Ref hdmf-dev#585 * This is just a workaround for checking the data_type of BuilderH5ReferenceDataset and BuilderH5TableDataset objects * Plan to add unit tests after some discussion to validate the approach
* These unit tests will begin to fail after the update to the hdmf validator that increase validation coverage * Ref hdmf-dev/hdmf#585 * Ref hdmf-dev/hdmf#609 * See discussion: hdmf-dev/hdmf#609 (comment)
* use ReferenceResolver instead of referencing BuilderH5ReferenceDataset or BuilderH5TableDataset * Fix hdmf-dev#585
…ecs (#609) * Validate builders against both top level data type specs and inner specs * Fix #585 * Ref #542 * Builder can now be validated against more than one spec in order to validate against additional fields added to inner data types * Also make validation Errors comparable as a way to remove duplicates that can sometimes be generated * Update changelog * Ref #585 * Fix pynwb validation errors related to reference and compound data types * Ref #585 * This is just a workaround for checking the data_type of BuilderH5ReferenceDataset and BuilderH5TableDataset objects * Plan to add unit tests after some discussion to validate the approach * Remove validator reference to H5-specific classes and add unit tests * use ReferenceResolver instead of referencing BuilderH5ReferenceDataset or BuilderH5TableDataset * Fix #585 * Update tests/unit/validator_tests/test_validate.py * Update tests/unit/validator_tests/test_validate.py Co-authored-by: Ryan Ly <rly@lbl.gov>
Closing this via #609 (I think it wasn't automatically closed because the merge was to rc/3.0.0) |
…ecs (#609) * Validate builders against both top level data type specs and inner specs * Fix #585 * Ref #542 * Builder can now be validated against more than one spec in order to validate against additional fields added to inner data types * Also make validation Errors comparable as a way to remove duplicates that can sometimes be generated * Update changelog * Ref #585 * Fix pynwb validation errors related to reference and compound data types * Ref #585 * This is just a workaround for checking the data_type of BuilderH5ReferenceDataset and BuilderH5TableDataset objects * Plan to add unit tests after some discussion to validate the approach * Remove validator reference to H5-specific classes and add unit tests * use ReferenceResolver instead of referencing BuilderH5ReferenceDataset or BuilderH5TableDataset * Fix #585 * Update tests/unit/validator_tests/test_validate.py * Update tests/unit/validator_tests/test_validate.py Co-authored-by: Ryan Ly <rly@lbl.gov>
Description
This comes out of the schema discussion here: hdmf-dev/hdmf-schema-language#13
in the context of how to handle extra fields #542
If a spec group/dataset is defined with only a
data_type_inc
, it should be allowed to extend the data_type without defining a new data type viadata_type_def
. For example, a group can contain a dataset which both includes/inherits fromVectorData
and extends to add a new attribute without definingdata_type_def
.The validator should allow for these circumstances and validate against all changes/extensions to the original data_type.
Ref:
data_type_inc
hdmf-schema-language#13implementation notes:
The current validator creates a map of specs/validators (ValidatorMap) using the defined data_type (via
data_type_def
) as the key. When validating a builder against the above spec, it validatesnew_column
against the original spec forVectorData
rather than the modified version above, and not return aMissingError
if the attribute is missing. Furthermore, as of #542, anExtraFieldWarning
will be returned saying thatnew_attr
is not part of the spec.The validator will need to keep track of where it is in the spec tree rather than relying on specific data types. Perhaps the ValidatorMap will only be used for base-level data_types.
Steps to Reproduce
The following should return a
MissingError
because the dataset builder does not have thenew_attr
attribute.Furthermore, after re-merging of #542, if the
new_attr
attribute is present, it should not return anExtraFieldWarning
.Environment
Checklist
The text was updated successfully, but these errors were encountered: