Files should be validated on write #1288

rly · 2020-08-26T23:53:25Z

Despite our best efforts, it is possible to use PyNWB to write a file that does not validate against the schema. Users should be aware that the file that they just wrote does not comply with the schema and therefore may not work with certain tools.

See also this old discussion: #306

Thoughts? @oruebel @ajtritt @bendichter

t-b · 2020-08-27T09:17:57Z

@rly I'm all for it! And not just a warning, maybe a flag like writeInvalidFiles which needs to be true in this case.

oruebel · 2020-08-27T17:15:00Z

Is it possible to run the validator on the builders before write? If so, that would allow us to save cost for having to write and then read the file again and also allow us to prevent any I/O from happening to avoid creating bad files.

rly · 2020-08-27T17:33:43Z

@oruebel That should be possible but I'm not sure how that would work with DataChunkIterators which will not have been written yet. Also before writing, the data would be in lists/tuples/numpy arrays instead of in H5Datasets. That should not matter much, but I think it would be better to have the validator work exactly as it would if the validator were called on the data file. Point taken though that there is extra overhead involved in validating after write.

We could also strongly encourage users to validate themselves after writing, but I think many users would assume that PyNWB cannot write a non-compliant file and not bother validating after writing.

oruebel · 2020-08-27T17:53:20Z

@rly it may make sense to do both. I.e., validate builders before write to catch possible errors before write even happens and then validate after write again, to make sure the file is actually correct. I think a validate before write should be able to catch the vast majority of problems and should also be fairly cheap (at least compared to validation after write). Even when DataChunkIterators are used, the only thing we may not know is the final total shape of the dataset, however, we should still know the data type and the initial shape, which should be sufficient for most validation needs, as the dimension you iterate over is rarely a dimension that has a fixed required length. In general, whether you want to validate before and/or after write, these should be configurable options, as users may want to skip these steps for performance reasons.

rly added the category: enhancement improvements of code or code behavior label Aug 26, 2020

rly mentioned this issue Sep 5, 2024

[Bug]: Shape should be validated on write hdmf-dev/hdmf#1190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files should be validated on write #1288

Files should be validated on write #1288

rly commented Aug 26, 2020

t-b commented Aug 27, 2020

oruebel commented Aug 27, 2020

rly commented Aug 27, 2020

oruebel commented Aug 27, 2020

Files should be validated on write #1288

Files should be validated on write #1288

Comments

rly commented Aug 26, 2020

t-b commented Aug 27, 2020

oruebel commented Aug 27, 2020

rly commented Aug 27, 2020

oruebel commented Aug 27, 2020