Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate on instantiation #546

Merged
merged 2 commits into from
Mar 2, 2018
Merged

Conversation

jakevdp
Copy link
Collaborator

@jakevdp jakevdp commented Mar 1, 2018

This PR adds code that validates Schema classes when they're instantiated, rather than only when they're converted to a dictionary. This significantly improves the user experience (see examples below)

The downsides of this change are:

  • increased code complexity: much more is happening in SchemaBase to accomplish this, and to make certain this kind of validation is disabled in cases that it causes issues (for example, when you do something like encode(x='Origin'), the encoding is not valid until you can add type information, which requires data that may not be available until the final serialization).
  • chart creation is a bit slower, because lower-level parts of the chart are being repeatedly validated as the chart is built up. In practice, that's only really noticeable when building several dozen examples back-to-back.
  • some approaches that may have previously worked (such as creating an empty object and then adding attributes sequentially) will now raise an error. e.g. you could not do
    sel = alt.selection()
    sel.type = 'interval'
    because type is required by the schema at the time it's instantiated. I've disabled this validation check on all top-level objects ('Chart', 'LayeredChart', 'X', 'Color', etc.), and I think that covers 99% of the cases where people use this pattern.

I think those downsides are worth swallowing for the added usability of informative error messages. In case performance is critical, I added a DEBUG_MODE flag that can be set to False; when it is, the chart will only be validated once, after the final serialization is constructed.

Previous behavior

Before this PR, if you make a mistake somewhere deep in the chart, this is the kind of traceback you get:

import altair as alt
from vega_datasets import data

driving = data.driving.url

base = alt.Chart(driving).encode(
    x = alt.X('miles:Q', scale = alt.Scale(zero=False)), 
    y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
    order = 'year:Q'
)

chart = base.mark_line() + base.mark_point().interactive()

chart.to_dict()
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-1-071b9cebc769> in <module>()
     13 chart = base.mark_line() + base.mark_point().interactive()
     14 
---> 15 chart.to_dict()

~/github/altair-viz/altair/altair/vegalite/v2/api.py in to_dict(self, *args, **kwargs)
    176         kwargs['context'] = context
    177 
--> 178         dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
    179 
    180         if is_top_level:

~/github/altair-viz/altair/altair/utils/schemapi.py in to_dict(self, validate, ignore, context)
    153                              "cannot serialize to dict".format(self.__class__))
    154         if validate:
--> 155             self.validate(result)
    156         return result
    157 

~/github/altair-viz/altair/altair/utils/schemapi.py in validate(cls, instance, schema)
    192             schema = cls._schema
    193         resolver = jsonschema.RefResolver.from_schema(cls._rootschema or cls._schema)
--> 194         return jsonschema.validate(instance, schema, resolver=resolver)
    195 
    196     @classmethod

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    539         cls = validator_for(schema)
    540     cls.check_schema(schema)
--> 541     cls(schema, *args, **kwargs).validate(instance)

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(self, *args, **kwargs)
    128         def validate(self, *args, **kwargs):
    129             for error in self.iter_errors(*args, **kwargs):
--> 130                 raise error
    131 
    132         def is_type(self, instance, type):

ValidationError: {'mark': 'line', 'data': {'url': 'https://vega.github.io/vega-datasets/data/driving.json'}, 'encoding': {'order': {'type': 'quantitative', 'field': 'year'}, 'x': {'type': 'quantitative', 'field': 'miles', 'scale': {'zero': False}}, 'y': {'type': 'quantitative', 'field': 'gas', 'scale': {'zero': 'false'}}}} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['layer']['items']:
    {'anyOf': [{'$ref': '#/definitions/LayerSpec'},
               {'$ref': '#/definitions/CompositeUnitSpec'}]}

On instance['layer'][0]:
    {'data': {'url': 'https://vega.github.io/vega-datasets/data/driving.json'},
     'encoding': {'order': {'field': 'year', 'type': 'quantitative'},
                  'x': {'field': 'miles',
                        'scale': {'zero': False},
                        'type': 'quantitative'},
                  'y': {'field': 'gas',
                        'scale': {'zero': 'false'},
                        'type': 'quantitative'}},
     'mark': 'line'}

Not much specific information about where the code went wrong... it basically spits out the entire chart serialization and tells you that it's invalid, and it's up to you to decipher why.

After this PR, the behavior looks like this:

import altair as alt
from vega_datasets import data

driving = data.driving.url

base = alt.Chart(driving).encode(
    x = alt.X('miles:Q', scale = alt.Scale(zero=False)), 
    y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
    order = 'year:Q'
)

chart = base.mark_line() + base.mark_point().interactive()

chart.to_dict()
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-1-3f4dc1ac6e49> in <module>()
      6 base = alt.Chart(driving).encode(
      7     x = alt.X('miles:Q', scale = alt.Scale(zero=False)),
----> 8     y = alt.Y('gas:Q', scale = alt.Scale(zero='false')),
      9     order = 'year:Q'
     10 )

~/github/altair-viz/altair/altair/vegalite/v2/schema/core.py in __init__(self, base, clamp, domain, exponent, interpolate, nice, padding, paddingInner, paddingOuter, range, rangeStep, round, scheme, type, zero, **kwds)
   4557                                     paddingOuter=paddingOuter, range=range,
   4558                                     rangeStep=rangeStep, round=round, scheme=scheme,
-> 4559                                     type=type, zero=zero, **kwds)
   4560 
   4561 

~/github/altair-viz/altair/altair/utils/schemapi.py in __init__(self, *args, **kwds)
     76 
     77         if DEBUG_MODE and self._class_is_valid_at_instantiation:
---> 78             dct = self.to_dict(validate=True)
     79 
     80     def copy(self, deep=True, ignore=()):

~/github/altair-viz/altair/altair/utils/schemapi.py in to_dict(self, validate, ignore, context)
    189                              "cannot serialize to dict".format(self.__class__))
    190         if validate:
--> 191             self.validate(result)
    192         return result
    193 

~/github/altair-viz/altair/altair/utils/schemapi.py in validate(cls, instance, schema)
    228             schema = cls._schema
    229         resolver = jsonschema.RefResolver.from_schema(cls._rootschema or cls._schema)
--> 230         return jsonschema.validate(instance, schema, resolver=resolver)
    231 
    232     @classmethod

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    539         cls = validator_for(schema)
    540     cls.check_schema(schema)
--> 541     cls(schema, *args, **kwargs).validate(instance)

~/anaconda/envs/vega3/lib/python3.6/site-packages/jsonschema/validators.py in validate(self, *args, **kwargs)
    128         def validate(self, *args, **kwargs):
    129             for error in self.iter_errors(*args, **kwargs):
--> 130                 raise error
    131 
    132         def is_type(self, instance, type):

ValidationError: 'false' is not of type 'boolean'

Failed validating 'type' in schema['properties']['zero']:
    {'description': 'If `true`, ensures that a zero baseline value is '
                    'included in the scale domain.\n'
                    '\n'
                    '__Default value:__ `true` for x and y channels if the '
                    'quantitative field is not binned and no custom '
                    '`domain` is provided; `false` otherwise.\n'
                    '\n'
                    '__Note:__ Log, time, and utc scales do not support '
                    '`zero`.',
     'type': 'boolean'}

On instance['zero']:
    'false'

The error is raised when the class is instantiated, which means you have the Python code context to help you figure out where things went wrong.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Mar 1, 2018

@eitanlees @afonit as our resident example building experts, I'd appreciate your thoughts on this. How often has the traceback behavior caused you issues?

@eitanlees
Copy link
Contributor

I would love more informative error messages. For the example building it's a lot of trial and error but an informative error message should speed up the process.

@afonit
Copy link
Contributor

afonit commented Mar 2, 2018

@jakevdp I would readily embrace any effort that would allow me to iterate faster on building a visual. The traceback caused me issues quite often for one reason - when I was working on personal examples - I sometimes had a lot of data in the sample and when I would get an error it would often spit out the whole dataset in the error which led to lots of scrolling just to get to the bottom that had some useful bits.

What you have proposed above looks useful but I would honestly hate to see development slowed/complexity increased.

I am developing an introduction to data analysis in python for my company and will be using altair as the preferred visualization tool. I think the current state of errors would really be frustrating for many of the people I will be teaching.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Mar 2, 2018

Thanks for the input. I think the consensus is that the increased complexity is worth it to improve the user experience.

I'm going to merge this – as you experiment more with Altair, please feel free to open issues if you find examples of tracebacks that are particularly unhelpful!

@jakevdp jakevdp merged commit 39c194d into vega:master Mar 2, 2018
@jakevdp jakevdp deleted the validate-on-instantiation branch March 2, 2018 00:31
@ellisonbg
Copy link
Collaborator

ellisonbg commented Mar 2, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants