Skip to content

Commit

Permalink
Expand and document support for column names with special characters (#…
Browse files Browse the repository at this point in the history
…2905)

* Remove escape sequences when inferring types

* Add docs for escaping special characters

* Clarify about long form syntax

* Update doc/user_guide/encodings/index.rst

Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>

---------

Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>
  • Loading branch information
joelostblom and mattijn authored Feb 26, 2023
1 parent d680c9a commit 8a3c6b9
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 2 deletions.
5 changes: 3 additions & 2 deletions altair/utils/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,8 +535,9 @@ def parse_shorthand(

# if data is specified and type is not, infer type from data
if isinstance(data, pd.DataFrame) and "type" not in attrs:
if "field" in attrs and attrs["field"] in data.columns:
attrs["type"] = infer_vegalite_type(data[attrs["field"]])
# Remove escape sequences so that types can be inferred for columns with special characters
if "field" in attrs and attrs["field"].replace("\\", "") in data.columns:
attrs["type"] = infer_vegalite_type(data[attrs["field"].replace("\\", "")])
# ordered categorical dataframe columns return the type and sort order as a tuple
if isinstance(attrs["type"], tuple):
attrs["sort"] = attrs["type"][1]
Expand Down
1 change: 1 addition & 0 deletions doc/releases/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Enhancements
- More informative autocompletion by removing deprecated methods (#2814) and for editors that rely on type hints (e.g. VS Code) we added support for completion in method chains (#2846) and extended keyword completion to cover additional methods (#2920).
- Substantially improved error handling. Both in terms of finding the more relevant error (#2842), and in terms of improving the formatting and clarity of the error messages (#2824, #2568).
- Include experimental support for the DataFrame Interchange Protocol (through `__dataframe__` attribute). This requires `pyarrow>=11.0.0` (#2888).
- Support data type inference for columns with special characters (#2905).

Grammar Changes
~~~~~~~~~~~~~~~
Expand Down
51 changes: 51 additions & 0 deletions doc/user_guide/encodings/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ For example, here we will visualize the cars dataset using four of the available

import altair as alt
from vega_datasets import data


cars = data.cars()

alt.Chart(cars).mark_point().encode(
Expand Down Expand Up @@ -224,6 +226,55 @@ Shorthand Equivalent long-form
``x='count():Q'`` ``alt.X(aggregate='count', type='quantitative')``
=================== =======================================================

Escaping special characters in column names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Seeing that Altair uses ``:`` as a special character
to indicate the encoding data type,
you might wonder what happens
when the column name in your data includes a colon.
When this is the case
you will need to either rename the column or escape the colon.
This is also true for other special characters
such as ``.`` and ``[]`` which are used to access nested attributes
in some data structures.

The recommended thing to do when you have special characters in a column name
is to rename your columns.
For example, in Pandas you could replace ``:`` with ``_``
via ``df.rename(columns = lambda x: x.replace(':', '_'))``.
If you don't want to rename your columns
you will need to escape the special characters using a backslash:

.. altair-plot::

import pandas as pd

source = pd.DataFrame({
'col:colon': [1, 2, 3],
'col.period': ['A', 'B', 'C'],
'col[brackets]': range(3),
})

alt.Chart(source).mark_bar().encode(
x='col\:colon',
# Remove the backslash in the title
y=alt.Y('col\.period', title='col.period'),
# Specify the data type
color='col\[brackets\]:N',
)

As can be seen above,
indicating the data type is optional
just as for columns without escaped characters.
Note that the axes titles include the backslashes by default
and you will need to manually set the title strings to remove them.
If you are using the long form syntax for encodings,
you do not need to escape colons as the type is explicit,
e.g. ``alt.X(field='col:colon', type='quantitative')``
(but periods and brackets still need to be escaped
in the long form syntax unless they are used to index nested data structures).


.. _encoding-aggregates:

Expand Down

0 comments on commit 8a3c6b9

Please sign in to comment.