-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Files should be validated on write #1288
Comments
@rly I'm all for it! And not just a warning, maybe a flag like |
Is it possible to run the validator on the builders before write? If so, that would allow us to save cost for having to write and then read the file again and also allow us to prevent any I/O from happening to avoid creating bad files. |
@oruebel That should be possible but I'm not sure how that would work with DataChunkIterators which will not have been written yet. Also before writing, the data would be in lists/tuples/numpy arrays instead of in H5Datasets. That should not matter much, but I think it would be better to have the validator work exactly as it would if the validator were called on the data file. Point taken though that there is extra overhead involved in validating after write. We could also strongly encourage users to validate themselves after writing, but I think many users would assume that PyNWB cannot write a non-compliant file and not bother validating after writing. |
@rly it may make sense to do both. I.e., validate builders before write to catch possible errors before write even happens and then validate after write again, to make sure the file is actually correct. I think a validate before write should be able to catch the vast majority of problems and should also be fairly cheap (at least compared to validation after write). Even when DataChunkIterators are used, the only thing we may not know is the final total shape of the dataset, however, we should still know the data type and the initial shape, which should be sufficient for most validation needs, as the dimension you iterate over is rarely a dimension that has a fixed required length. In general, whether you want to validate before and/or after write, these should be configurable options, as users may want to skip these steps for performance reasons. |
Despite our best efforts, it is possible to use PyNWB to write a file that does not validate against the schema. Users should be aware that the file that they just wrote does not comply with the schema and therefore may not work with certain tools.
See also this old discussion: #306
Thoughts? @oruebel @ajtritt @bendichter
The text was updated successfully, but these errors were encountered: