Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to skip validation on read #792

Conversation

eslavich
Copy link
Contributor

@eslavich eslavich commented May 6, 2020

This PR adds a flag to asdf.open and fits_embed.AsdfInFits.open that disables validation on read, for situations when the file is already known to be valid and the user wishes to avoid the performance penalty of an unnecessary validation pass.

Resolves #565

@eslavich eslavich added this to the 2.7.0 milestone May 6, 2020
@codecov
Copy link

codecov bot commented May 6, 2020

Codecov Report

Merging #792 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #792   +/-   ##
=======================================
  Coverage   93.91%   93.91%           
=======================================
  Files          43       43           
  Lines        5043     5044    +1     
=======================================
+ Hits         4736     4737    +1     
  Misses        307      307           
Impacted Files Coverage Δ
asdf/fits_embed.py 92.70% <ø> (ø)
asdf/asdf.py 92.83% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f2b399...05a4eaa. Read the comment docs.

@@ -82,6 +84,27 @@ def test_open_readonly(tmpdir):
pass


def test_open_validate_on_read(tmpdir):
tmpfile = str(tmpdir.join('invalid.asdf'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever anything written to this file path? I don't see it in the code below, but maybe there is some test magic going on that isn't apparent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhhhh... waves hand vaguely this is not the tempfile you're looking for

@jdavies-st
Copy link
Contributor

It looks like this validate_on_read=True flag doesn't change the anything by default, but can be switched to False and it will then not go through any schema validation? Is the use of the schema to deserialize the ASDF content into a python object separate from validation of the ASDF against the various schemas?

Copy link
Contributor

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straightforward and clear to me.

@eslavich
Copy link
Contributor Author

eslavich commented May 6, 2020

It looks like this validate_on_read=True flag doesn't change the anything by default, but can be switched to False and it will then not go through any schema validation?

Yup! Well, it'll skip the main validation pass, anyway. The validator will still be used to fill in default values, but that pass doesn't check data types or anything.

Is the use of the schema to deserialize the ASDF content into a python object separate from validation of the ASDF against the various schemas?

AFAIK, if validation and filling defaults are both disabled, the schema isn't used at all.

@jdavies-st
Copy link
Contributor

Ah, I see. So then only the CustomType subclass is used to turn the ASDF YAML into the custom python object?

@eslavich
Copy link
Contributor Author

eslavich commented May 6, 2020

Right, and that matches to the YAML by tag, which is already known in both the file and the subclass, so the schema isn't consulted.

@perrygreenfield
Copy link
Contributor

But tag code is still executed?

@jdavies-st
Copy link
Contributor

Wow, I didn't realize the schemas were so superfluous. =)

@perrygreenfield perrygreenfield merged commit 63eb0b5 into asdf-format:master May 6, 2020
@eslavich eslavich deleted the AL-123-add-flag-to-skip-validation branch May 6, 2020 14:51
@eslavich
Copy link
Contributor Author

eslavich commented May 6, 2020

But tag code is still executed?

I made an interesting test: I deleted a bunch of core schemas and opened a file with validate_on_read=False and do_not_fill_defaults=True, and it opens and deserializes correctly without so much as a warning.

@perrygreenfield
Copy link
Contributor

So the only subtlety is regarding defaults. In one case (using schemas), one has the attribute with a default value, but if validate_on_read=False, then the attribute is missing unless the tag code also handles defaults implicitly.

@perrygreenfield
Copy link
Contributor

The more I see the implications, the less I like schemas handling defaults. But I may be having a brain short circuit...

@jdavies-st
Copy link
Contributor

I agree that handling defaults is bad, as it makes data values depend on the vagaries of schemas. Just to save a few bytes.

And there's the whole issue of not being able to serialize None because that's the value that asdf uses to know whether or not to fill defaults.

@eslavich
Copy link
Contributor Author

eslavich commented May 6, 2020

So the only subtlety is regarding defaults. In one case (using schemas), one has the attribute with a default value, but if validate_on_read=False, then the attribute is missing unless the tag code also handles defaults implicitly.

Actually, the defaults are still filled in. That happens on a separate pass through the tree that this PR doesn't impact. My next PR does combine those two passes when both validation and defaults are enabled, but even there the defaults are still filled in with validation disabled.

That said, I would love it if we could drop the defaults feature in 3.0. The more time I spend trying to figure out a sane procedure for choosing a default from a complex schema the more I suspect there isn't any such procedure. These schemas just aren't designed to provide a definitive answer to the question "what is the value of an annotation for this node?". IMO the default field should exist only as advice to people writing deserializers, and not be acted upon by asdf code. The type class seems like a more natural place to set defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to skip validation
3 participants