Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Reduce SchemaValidationError traceback length #3530

Merged
merged 31 commits into from
Aug 31, 2024

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Aug 9, 2024

This PR was prompted by difficulty in reading the traceback of failures in #3501.

Description

There has clearly been a great deal of care put into using schemapi.SchemaValidationError to produce a more useful traceback.

An element that I believe was overlooked, was that the surrounding code also contributes to the traceback length.

Before/After

The exact traceback seems to vary by the python interpreter used.
ipython could be considered the best-case, with pytest the worst-case.

Anything else I imagine would fall somewhere between the length of each.

ipython

Reduction of roughly 10 lines
Importantly, long comments that are not relevant to the user are no longer displayed.

Before

image

After

image

pytest

Reduction of roughly 60 lines

Before
(doc) PS C:\Users\danie\Documents\GitHub\altair> pytest  -k test_to_dict_huge_traceback --numprocesses=1
===================================================================================== test session starts =====================================================================================
platform win32 -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0
rootdir: C:\Users\danie\Documents\GitHub\altair
configfile: pyproject.toml
plugins: anyio-4.4.0, cov-5.0.0, xdist-3.6.1
1 worker [1 item]      
F                                                                                                                                                                                        [100%]
========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_to_dict_huge_traceback _________________________________________________________________________________ 
[gw0] win32 -- Python 3.12.3 C:\Users\danie\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\doc\Scripts\python.exe

    def test_to_dict_huge_traceback():
        # Error: Invalid value for type
>       return alt.Chart().encode(alt.X(type="unknown")).to_dict()

tests\utils\test_schemapi.py:501:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
altair\vegalite\v5\api.py:3771: in to_dict
    return super(Chart, copy).to_dict(**kwds)
altair\vegalite\v5\api.py:1807: in to_dict
    vegalite_spec: Any = super(TopLevelMixin, copy).to_dict(  # type: ignore[misc]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = alt.Chart(...), validate = True

    def to_dict(
        self,
        validate: bool = True,
        *,
        ignore: list[str] | None = None,
        context: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """
        Return a dictionary representation of the object.

        Parameters
        ----------
        validate : bool, optional
            If True (default), then validate the output dictionary
            against the schema.
        ignore : list[str], optional
            A list of keys to ignore. It is usually not needed
            to specify this argument as a user.
        context : dict[str, Any], optional
            A context dictionary. It is usually not needed
            to specify this argument as a user.

        Notes
        -----
        Technical: The ignore parameter will *not* be passed to child to_dict
        function calls.

        Returns
        -------
        dict
            The dictionary representation of this object

        Raises
        ------
        SchemaValidationError :
            if validate=True and the dict does not conform to the schema
        """
        if context is None:
            context = {}
        if ignore is None:
            ignore = []
        # The following return the package only if it has already been
        # imported - otherwise they return None. This is useful for
        # isinstance checks - for example, if pandas has not been imported,
        # then an object is definitely not a `pandas.Timestamp`.
        pd_opt = sys.modules.get("pandas")
        np_opt = sys.modules.get("numpy")

        if self._args and not self._kwds:
            result = _todict(
                self._args[0], context=context, np_opt=np_opt, pd_opt=pd_opt
            )
        elif not self._args:
            kwds = self._kwds.copy()
            # parsed_shorthand is added by FieldChannelMixin.
            # It's used below to replace shorthand with its long form equivalent
            # parsed_shorthand is removed from context if it exists so that it is
            # not passed to child to_dict function calls
            parsed_shorthand = context.pop("parsed_shorthand", {})
            # Prevent that pandas categorical data is automatically sorted
            # when a non-ordinal data type is specifed manually
            # or if the encoding channel does not support sorting
            if "sort" in parsed_shorthand and (
                "sort" not in kwds or kwds["type"] not in {"ordinal", Undefined}
            ):
                parsed_shorthand.pop("sort")

            kwds.update(
                {
                    k: v
                    for k, v in parsed_shorthand.items()
                    if kwds.get(k, Undefined) is Undefined
                }
            )
            kwds = {
                k: v for k, v in kwds.items() if k not in {*list(ignore), "shorthand"}
            }
            if "mark" in kwds and isinstance(kwds["mark"], str):
                kwds["mark"] = {"type": kwds["mark"]}
            result = _todict(kwds, context=context, np_opt=np_opt, pd_opt=pd_opt)
        else:
            msg = (
                f"{self.__class__} instance has both a value and properties : "
                "cannot serialize to dict"
            )
            raise ValueError(msg)
        if validate:
            try:
                self.validate(result)
            except jsonschema.ValidationError as err:
                # We do not raise `from err` as else the resulting
                # traceback is very long as it contains part
                # of the Vega-Lite schema. It would also first
                # show the less helpful ValidationError instead of
                # the more user friendly SchemaValidationError
>               raise SchemaValidationError(self, err) from None
E               altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson'].   

altair\utils\schemapi.py:1065: SchemaValidationError
=================================================================================== short test summary info =================================================================================== 
FAILED tests/utils/test_schemapi.py::test_to_dict_huge_traceback - altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson'].
====================================================================================== 1 failed in 9.41s ====================================================================================== 
After
(doc) PS C:\Users\danie\Documents\GitHub\altair> pytest  -k test_to_dict_huge_traceback --numprocesses=1
===================================================================================== test session starts ======================================================================================
platform win32 -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0
rootdir: C:\Users\danie\Documents\GitHub\altair
configfile: pyproject.toml
plugins: anyio-4.4.0, cov-5.0.0, xdist-3.6.1
1 worker [1 item]      
F                                                                                                                                                                                         [100%]
=========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_to_dict_huge_traceback __________________________________________________________________________________
[gw0] win32 -- Python 3.12.3 C:\Users\danie\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\doc\Scripts\python.exe

    def test_to_dict_huge_traceback():
        # Error: Invalid value for type
>       return alt.Chart().encode(alt.X(type="unknown")).to_dict()

tests\utils\test_schemapi.py:501:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
altair\vegalite\v5\api.py:3778: in to_dict
    return super(Chart, copy).to_dict(**kwds)
altair\vegalite\v5\api.py:1818: in to_dict
    vegalite_spec: Any = _top_schema_base(super(TopLevelMixin, copy)).to_dict(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  

self = alt.Chart(...), validate = True

    def to_dict(
        self,
        validate: bool = True,
        *,
        ignore: list[str] | None = None,
        context: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """
        Return a dictionary representation of the object.

        Parameters
        ----------
        validate : bool, optional
            If True (default), then validate the result against the schema.
        ignore : list[str], optional
            A list of keys to ignore.
        context : dict[str, Any], optional
            A context dictionary.

        Raises
        ------
        SchemaValidationError :
            If ``validate`` and the result does not conform to the schema.

        Notes
        -----
        - ``ignore``, ``context`` are usually not needed to be specified as a user.
        - *Technical*: ``ignore`` will **not** be passed to child :meth:`.to_dict()`.
        """
        context = context or {}
        ignore = ignore or []
        opts = _get_optional_modules(np_opt="numpy", pd_opt="pandas")

        if self._args and not self._kwds:
            kwds = self._args[0]
        elif not self._args:
            kwds = self._kwds.copy()
            exclude = {*ignore, "shorthand"}
            if parsed := context.pop("parsed_shorthand", None):
                kwds = _replace_parsed_shorthand(parsed, kwds)
            kwds = {k: v for k, v in kwds.items() if k not in exclude}
            if (mark := kwds.get("mark")) and isinstance(mark, str):
                kwds["mark"] = {"type": mark}
        else:
            msg = f"{type(self)} instance has both a value and properties : cannot serialize to dict"
            raise ValueError(msg)
        result = _todict(kwds, context=context, **opts)
        if validate:
            try:
                self.validate(result)
            except jsonschema.ValidationError as err:
>               raise SchemaValidationError(self, err) from None
E               altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson'].    

altair\utils\schemapi.py:1036: SchemaValidationError
=================================================================================== short test summary info ==================================================================================== 
FAILED tests/utils/test_schemapi.py::test_to_dict_huge_traceback - altair.utils.schemapi.SchemaValidationError: 'unknown' is an invalid value for `type`. Valid values are one of ['quantitative', 'ordinal', 'temporal', 'nominal', 'geojson'].
====================================================================================== 1 failed in 9.84s =======================================================================================

Notes for reviewers

I was very careful in slowly introducing these changes & ensuring each commit passed the test suite.
Therefore, the commit messages contain a lot of information which explain each decision.

A general theme was to move long comments into docstrings, and reduce lines & characters where possible - to reduce noise.

No information is lost, still available on hover.
The comment itself is not helpful to users and would be presented on every validation error
Moved repeat info to *Notes*, which should come last https://numpydoc.readthedocs.io/en/latest/format.html#notes
Removed redundant *Returns*
Minor changes to improve readability
Saves 2 lines, identical result, maybe faster due to not allocating an intermediate dict
All `...ChannelMixin` already use this form. Saves 2 lines
Saves only 1 line, but collapses the dict comp.
The `:=` expr doesn't reduce lines, but does reduce characters
Saves 2 lines, avoids repeating the 3 kwargs - but allows modified `context`
As with the change to `SchemaValidationError`, the information is available in a docstring if needed.
But no longer is displayed to the user on a validation error
Removes 5 lines.
Has the additional benefit of replacing an identical block in `SchemaBase.validate_property`.
Therefore reduces the maintenance burden of keeping comments in sync
…copy)`

The call to `.to_dict`, and 3 line comment also appeared in the traceback for validation errors on `ChartType`s.
This removes the need for the comment and type ignore that it is describing.
See vega#3480 for more info on the introduction of `_top_schema_base`
Discovered during debugging that the rhs always evaluated as `type`, which was not intentional
Should achieve greater consistency across the public API
@dangotbanned dangotbanned requested a review from binste August 15, 2024 20:09
@dangotbanned
Copy link
Member Author

Apologies for all the review requests.

I'm hoping to merge this ahead of #3547, which will have a lot of conflicts in schemapi.py.

Ideally we could merge this, I can resolve conflicts then tidy #3547 up and move that out of draft

@dangotbanned dangotbanned enabled auto-merge (squash) August 29, 2024 13:56
@joelostblom
Copy link
Contributor

I started reviewing this yesterday and should be able to finish today. It would be good if one more person has at least a quick glance at it.

@dangotbanned
Copy link
Member Author

I started reviewing this yesterday and should be able to finish today. It would be good if one more person has at least a quick glance at it.

Thank you @joelostblom

Copy link
Contributor

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thought and work and you put into this @dangotbanned ! A special shout out to your commits messages, which are always so detailed and explanatory; I have found this really helpful when reviewing. In fact, you have inspired me to write more elaborate commit messages myself! (although I wish github's PR review UI had an easy way to see the messages on a line per line bases when in the files view, and not only when going commit by commit...)

Overall, I think the shorter error messages you introduce here add utility, and I believe you have largely avoided that this comes at the expense of code readability, which is important and something we want to be careful with. I have a few comments and I would like someone else to have at least a quick look before merging, in case there is something I'm missing here.

altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/vegalite/v5/api.py Outdated Show resolved Hide resolved
@dangotbanned
Copy link
Member Author

Thanks for the thought and work and you put into this @dangotbanned ! A special shout out to your commits messages, which are always so detailed and explanatory; I have found this really helpful when reviewing. In fact, you have inspired me to write more elaborate commit messages myself! (although I wish github's PR review UI had an easy way to see the messages on a line per line bases when in the files view, and not only when going commit by commit...)

Overall, I think the shorter error messages you introduce here add utility, and I believe you have largely avoided that this comes at the expense of code readability, which is important and something we want to be careful with. I have a few comments

Really appreciate the compliment and review @joelostblom!

Yeah it is a shame the messages aren't always easy to find.
I used them a lot here for keeping notes on all the small changes, but I'm starting to think using a conversation thread might make for an easier review.

Hopefully #3530 (commits) addresses your comments.

and I would like someone else to have at least a quick look before merging, in case there is something I'm missing here.

I agree on having another set of eyes.
#3547 includes a lot of similar changes - so consensus here would make for fewer surprises over there

Copy link
Contributor

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, I'm approving but let's get another approve before merging.

but I'm starting to think using a conversation thread might make for an easier review.

The little I have tried, I have found it more convenient to write down my thoughts in a commit message right after I make each change to the code, rather than storing it somewhere and pasting it as a comment once the PR is made. Ultimately, I think this is a Github UI issue, it should be possible to view the commit message line by line and quote it in a reply when reviewing in the files tab.

altair/utils/schemapi.py Show resolved Hide resolved
altair/vegalite/v5/api.py Outdated Show resolved Hide resolved
@dangotbanned dangotbanned disabled auto-merge August 30, 2024 17:52
@dangotbanned
Copy link
Member Author

LGTM now, I'm approving but let's get another approve before merging.

Thanks again @joelostblom

Just disabled auto-merge to support this #3530 (comment)

Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed a semi-thorough review and LGTM as well. Thanks @dangotbanned! Before seeing the "Before" and "After" in your PR description, I never realized how long these tracebacks are but now it bugs me that we didn't fix this sooner ... :D

@dangotbanned dangotbanned enabled auto-merge (squash) August 31, 2024 13:19
@dangotbanned dangotbanned merged commit 98d9fb8 into vega:main Aug 31, 2024
13 checks passed
@dangotbanned dangotbanned deleted the minimise-schemabase-traceback branch September 5, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants