Validate on instantiation #546

jakevdp · 2018-03-01T21:42:44Z

This PR adds code that validates Schema classes when they're instantiated, rather than only when they're converted to a dictionary. This significantly improves the user experience (see examples below)

The downsides of this change are:

increased code complexity: much more is happening in SchemaBase to accomplish this, and to make certain this kind of validation is disabled in cases that it causes issues (for example, when you do something like encode(x='Origin'), the encoding is not valid until you can add type information, which requires data that may not be available until the final serialization).
chart creation is a bit slower, because lower-level parts of the chart are being repeatedly validated as the chart is built up. In practice, that's only really noticeable when building several dozen examples back-to-back.
some approaches that may have previously worked (such as creating an empty object and then adding attributes sequentially) will now raise an error. e.g. you could not do
```
sel = alt.selection()
sel.type = 'interval'
```
because type is required by the schema at the time it's instantiated. I've disabled this validation check on all top-level objects ('Chart', 'LayeredChart', 'X', 'Color', etc.), and I think that covers 99% of the cases where people use this pattern.

I think those downsides are worth swallowing for the added usability of informative error messages. In case performance is critical, I added a DEBUG_MODE flag that can be set to False; when it is, the chart will only be validated once, after the final serialization is constructed.

Previous behavior

Before this PR, if you make a mistake somewhere deep in the chart, this is the kind of traceback you get:

import altair as alt
from vega_datasets import data

driving = data.driving.url

base = alt.Chart(driving).encode(
    x = alt.X('miles:Q', scale = alt.Scale(zero=False)), 
    y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
    order = 'year:Q'
)

chart = base.mark_line() + base.mark_point().interactive()

chart.to_dict()

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-1-071b9cebc769> in <module>()
     13 chart = base.mark_line() + base.mark_point().interactive()
     14 
---> 15 chart.to_dict()

~/github/altair-viz/altair/altair/vegalite/v2/api.py in to_dict(self, *args, **kwargs)
    176         kwargs['context'] = context
    177 
--> 178         dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
    179 
    180         if is_top_level:

~/github/altair-viz/altair/altair/utils/schemapi.py in to_dict(self, validate, ignore, context)
    153                              "cannot serialize to dict".format(self.__class__))
    154         if validate:
--> 155             self.validate(result)
    156         return result
    157 

~/github/altair-viz/altair/altair/utils/schemapi.py in validate(cls, instance, schema)
    192             schema = cls._schema
    193         resolver = jsonschema.RefResolver.from_schema(cls._rootschema or cls._schema)
--> 194         return jsonschema.validate(instance, schema, resolver=resolver)
    195 
    196     @classmethod

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    539         cls = validator_for(schema)
    540     cls.check_schema(schema)
--> 541     cls(schema, *args, **kwargs).validate(instance)

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(self, *args, **kwargs)
    128         def validate(self, *args, **kwargs):
    129             for error in self.iter_errors(*args, **kwargs):
--> 130                 raise error
    131 
    132         def is_type(self, instance, type):

ValidationError: {'mark': 'line', 'data': {'url': 'https://vega.github.io/vega-datasets/data/driving.json'}, 'encoding': {'order': {'type': 'quantitative', 'field': 'year'}, 'x': {'type': 'quantitative', 'field': 'miles', 'scale': {'zero': False}}, 'y': {'type': 'quantitative', 'field': 'gas', 'scale': {'zero': 'false'}}}} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['layer']['items']:
    {'anyOf': [{'$ref': '#/definitions/LayerSpec'},
               {'$ref': '#/definitions/CompositeUnitSpec'}]}

On instance['layer'][0]:
    {'data': {'url': 'https://vega.github.io/vega-datasets/data/driving.json'},
     'encoding': {'order': {'field': 'year', 'type': 'quantitative'},
                  'x': {'field': 'miles',
                        'scale': {'zero': False},
                        'type': 'quantitative'},
                  'y': {'field': 'gas',
                        'scale': {'zero': 'false'},
                        'type': 'quantitative'}},
     'mark': 'line'}

Not much specific information about where the code went wrong... it basically spits out the entire chart serialization and tells you that it's invalid, and it's up to you to decipher why.

After this PR, the behavior looks like this:

import altair as alt
from vega_datasets import data

driving = data.driving.url

base = alt.Chart(driving).encode(
    x = alt.X('miles:Q', scale = alt.Scale(zero=False)), 
    y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
    order = 'year:Q'
)

chart = base.mark_line() + base.mark_point().interactive()

chart.to_dict()

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-1-3f4dc1ac6e49> in <module>()
      6 base = alt.Chart(driving).encode(
      7     x = alt.X('miles:Q', scale = alt.Scale(zero=False)),
----> 8     y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
      9     order = 'year:Q'
     10 )

~/github/altair-viz/altair/altair/vegalite/v2/schema/core.py in __init__(self, base, clamp, domain, exponent, interpolate, nice, padding, paddingInner, paddingOuter, range, rangeStep, round, scheme, type, zero, **kwds)
   4557                                     paddingOuter=paddingOuter, range=range,
   4558                                     rangeStep=rangeStep, round=round, scheme=scheme,
-> 4559                                     type=type, zero=zero, **kwds)
   4560 
   4561 

~/github/altair-viz/altair/altair/utils/schemapi.py in __init__(self, *args, **kwds)
     76 
     77         if DEBUG_MODE and self._class_is_valid_at_instantiation:
---> 78             dct = self.to_dict(validate=True)
     79 
     80     def copy(self, deep=True, ignore=()):

~/github/altair-viz/altair/altair/utils/schemapi.py in to_dict(self, validate, ignore, context)
    189                              "cannot serialize to dict".format(self.__class__))
    190         if validate:
--> 191             self.validate(result)
    192         return result
    193 

~/github/altair-viz/altair/altair/utils/schemapi.py in validate(cls, instance, schema)
    228             schema = cls._schema
    229         resolver = jsonschema.RefResolver.from_schema(cls._rootschema or cls._schema)
--> 230         return jsonschema.validate(instance, schema, resolver=resolver)
    231 
    232     @classmethod

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    539         cls = validator_for(schema)
    540     cls.check_schema(schema)
--> 541     cls(schema, *args, **kwargs).validate(instance)

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(self, *args, **kwargs)
    128         def validate(self, *args, **kwargs):
    129             for error in self.iter_errors(*args, **kwargs):
--> 130                 raise error
    131 
    132         def is_type(self, instance, type):

ValidationError: 'false' is not of type 'boolean'

Failed validating 'type' in schema['properties']['zero']:
    {'description': 'If `true`, ensures that a zero baseline value is '
                    'included in the scale domain.\n'
                    '\n'
                    '__Default value:__ `true` for x and y channels if the '
                    'quantitative field is not binned and no custom '
                    '`domain` is provided; `false` otherwise.\n'
                    '\n'
                    '__Note:__ Log, time, and utc scales do not support '
                    '`zero`.',
     'type': 'boolean'}

On instance['zero']:
    'false'

The error is raised when the class is instantiated, which means you have the Python code context to help you figure out where things went wrong.

jakevdp · 2018-03-01T22:07:10Z

@eitanlees @afonit as our resident example building experts, I'd appreciate your thoughts on this. How often has the traceback behavior caused you issues?

eitanlees · 2018-03-01T23:04:27Z

I would love more informative error messages. For the example building it's a lot of trial and error but an informative error message should speed up the process.

afonit · 2018-03-02T00:26:16Z

@jakevdp I would readily embrace any effort that would allow me to iterate faster on building a visual. The traceback caused me issues quite often for one reason - when I was working on personal examples - I sometimes had a lot of data in the sample and when I would get an error it would often spit out the whole dataset in the error which led to lots of scrolling just to get to the bottom that had some useful bits.

What you have proposed above looks useful but I would honestly hate to see development slowed/complexity increased.

I am developing an introduction to data analysis in python for my company and will be using altair as the preferred visualization tool. I think the current state of errors would really be frustrating for many of the people I will be teaching.

jakevdp · 2018-03-02T00:30:15Z

Thanks for the input. I think the consensus is that the increased complexity is worth it to improve the user experience.

I'm going to merge this – as you experiment more with Altair, please feel free to open issues if you find examples of tracebacks that are particularly unhelpful!

ellisonbg · 2018-03-02T05:41:32Z

I am +1 on this - it shows the errors when the user is doing the thing that causes the error. Showing the errors on display is more removed from the user's mental state.

…

On Thu, Mar 1, 2018 at 4:30 PM, Jake Vanderplas ***@***.***> wrote: Merged #546 <#546>. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#546 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0HMpW1MxO8JKwrV08QAy3sKdC9CFks5taJKcgaJpZM4SZCfL> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

jakevdp added 2 commits March 1, 2018 12:55

EHN: make schema objects validate at instantiation

c0aaf04

Add comment about additional props

d032d34

jakevdp merged commit 39c194d into vega:master Mar 2, 2018

jakevdp deleted the validate-on-instantiation branch March 2, 2018 00:31

binste mentioned this pull request Jan 23, 2023

Schemapi debug mode always on? #2850

Closed

This was referenced Mar 15, 2023

Include all options in enum error messages #2957

Merged

Rename DEBUG_MODE #2965

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate on instantiation #546

Validate on instantiation #546

jakevdp commented Mar 1, 2018 •

edited

Loading

jakevdp commented Mar 1, 2018

eitanlees commented Mar 1, 2018

afonit commented Mar 2, 2018

jakevdp commented Mar 2, 2018

ellisonbg commented Mar 2, 2018 via email

Validate on instantiation #546

Validate on instantiation #546

Conversation

jakevdp commented Mar 1, 2018 • edited Loading

Previous behavior

jakevdp commented Mar 1, 2018

eitanlees commented Mar 1, 2018

afonit commented Mar 2, 2018

jakevdp commented Mar 2, 2018

ellisonbg commented Mar 2, 2018 via email

jakevdp commented Mar 1, 2018 •

edited

Loading