-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation feature #17
Comments
I agree, this is very usefull, but I disagree with another handfull of CLI options. I suggest verifing that the output is readble through verify:
open_kwargs:
engine: "zarr"
decode_cf: true
assertions:
attrs:
...
data_vars:
...
coord_vars:
... |
Note, I'd call the feature "verify" rather than "validate", because validate would also analyse the dataset's values which I feel is out-of-scope for nc2zarr. |
Another note:
|
This is what I had in mind, at least initially: verification just using the existing configuration for input, by checking that every specified input value (or some representative sample) is also present in the output. Of course that doesn't rule out adding a more elaborate, configurable verification facility later. On reflection, I agree about calling it "verify". "Validate" probably implies something a bit deeper than just checking |
Hi @pont-us, I started an implementation you may have a look already. |
Branch containing implementation: https://github.com/bcdev/nc2zarr/tree/forman-17-verify_dataset |
Probably not relevant to the main verification / validation implementation, but just for reference: I've committed a small standalone validation script, which I'm using to validate the converted Zarrs against selected source NetCDFs for the next deliverable. |
It would be useful to have an output validation option for nc2zarr, along the lines of
nc2zarr --config myconfig.yaml --validate
which would go through the input files specified in the configuration and make sure, for each data value corresponding to a variable specified in the configuration, that a matching data value exists in the configured output Zarr.
This would of course be potentially very time-consuming for large datasets;
--structure-only
(just check that the structure's as expected) and--sparse
(validate some small but representative subset of the data) may also be useful.The text was updated successfully, but these errors were encountered: