Improve HTML reprs #10816

shoyer · 2025-10-05T20:30:06Z

This PR adds a number of improvements and revisions to the Xarray's HTML reprs, especially for DataTree:

No line breaks in long headers like "Data variables" and "Inherited Coordinates"
Add ~4px of extra padding at the end of HTML reprs, to make pages like Xarray's docs look a little better
Remove 2px shift on headers when actively clicked on. (I think this was intentional, but it seems to result in weird layout glitches because the :active selector doesn't always go away when focus is moved elsewhere)
Remove the collapsable "Groups" header from DataTree. Instead, each group is separately collapsable, and shows the total number of contained elements.
Truncation for too HTML elements is revised. I've added the options display_max_items and display_max_html_elements for controlling at what point the DataTree HTML repr collapses and truncates nodes, instead of doing this all based on display_max_children.

This needs a few more tests and release notes, but is ready for feedback! @jsignell @TomNicholas @benbovy

Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

Code to generate HTML previews:

import xarray as xr
import numpy as np

# Set up coordinates
time = xr.DataArray(data=["2022-01", "2023-01"], dims="time")
stations = xr.DataArray(data=list("abcdef"), dims="station")
lon = [-100, -80, -60]
lat = [10, 20, 30]

# Set up fake data
wind_speed = xr.DataArray(np.ones((2, 6)) * 2, dims=("time", "station"))
pressure = xr.DataArray(np.ones((2, 6)) * 3, dims=("time", "station"))
air_temperature = xr.DataArray(np.ones((2, 6)) * 4, dims=("time", "station"))
dewpoint = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station"))
infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat"))
true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat"))

dt2 = xr.DataTree.from_dict(
    {
        "/": xr.Dataset(
            coords={"time": time},
        ),
        "/weather": xr.Dataset(
            coords={"station": stations},
            data_vars={
                "wind_speed": wind_speed,
                "pressure": pressure,
            },
        ),
        "/weather/temperature": xr.Dataset(
            data_vars={
                "air_temperature": air_temperature,
                "dewpoint": dewpoint,
            },
        ),
        "/satellite": xr.Dataset(
            coords={"lat": lat, "lon": lon},
            data_vars={
                "infrared": infrared,
                "true_color": true_color,
            },
        ),
    },
)
dt2['/other'] = xr.Dataset({f'x{i}': 0 for i in range(500)})

number_of_files = 20
number_of_groups = 50
tree_dict = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        tree_dict[f"file_{f}/group_{g}"] = xr.Dataset({"g": f * g})
tree_too_many = xr.DataTree.from_dict(tree_dict)


print("<h1>DataTree root</h1>")
print(dt2._repr_html_())

print("<hr />")
print("<h1>Dataset</h1>")

print(dt2.weather.to_dataset()._repr_html_())

print("<hr />")

print("<h1>DataTree inherited</h1>")
print(dt2.weather._repr_html_())

print("<hr />")
print("<h1>DataTree too many nodes</h1>")
print(tree_too_many._repr_html_())

Revised (this PR)

Interactive preview

Baseline

Interactive preview

jsignell · 2025-10-09T21:30:43Z

Ok I took a look at this with this kind of evil DataTree from the truncation work:

import numpy as np
import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables= 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds

dt = xr.DataTree.from_dict(datasets)

I really like the space changes and removing the collapsible "Groups" header and having each group be collapsible on its own.

I wasn't quite sure how to interpret the collapsed count for a group that just has one dataset in it. It seems like it is the n coords + n data_vars. Which seems odd. I think there shouldn't be a count on a group that just contains a single dataset.

The group level count when there are child groups should just be the number of groups.

I like the idea of having a display_max_html_elements and would be happy for it to be a lot lower than 300 by default, but truncation is still necessary for the case where there just are more than display_max_html_elements at the top level.

For instance you still get 700 top-level nodes in the repr when you do:

with xr.set_options(display_max_html_elements=5):
    display(dt)

I think in general it would be nice to be able to drill down into a particular node within the repr even if there are a bunch of items at a particular level.

shoyer · 2025-10-09T22:15:58Z

I wasn't quite sure how to interpret the collapsed count for a group that just has one dataset in it. It seems like it is the n coords + n data_vars. Which seems odd. I think there shouldn't be a count on a group that just contains a single dataset.

The group level count when there are child groups should just be the number of groups.

The strategy I was using is counting the number of hidden items (at any level), with the idea being that it should be obvious if a large amount of data is hidden. Otherwise you could have a collapsed group marked as "(1)" that hides hundreds of data variables, which felt wrong to me.

I like the idea of having a display_max_html_elements and would be happy for it to be a lot lower than 300 by default, but truncation is still necessary for the case where there just are more than display_max_html_elements at the top level.

Do you think this is common? I don't think we do this for the other Xarray HTML reprs. They get collapsed but nodes are not truncated at the top level.

I think in general it would be nice to be able to drill down into a particular node within the repr even if there are a bunch of items at a particular level.

I am currently displaying DataTree elements in priority order, based on showing the top-most levels as completely as possible (breadth-first). We could start by going deep (depth-first), but this would mean that some high-level nodes could be truncated.

Maybe there's some compromise algorithm that could work better?

shoyer added 8 commits September 28, 2025 13:15

Less vertical whitespace in HTML reprs

c2e1384

ensure consistent line-height in google colab

b3bb386

Refactor DataTree HTML repr

ca16214

Merge branch 'main' into html-tree

bade174

Merge branch 'main' into html-tree

2459e64

Collapsable DataTree nodes

1842a39

more formatting

73e46ec

Tweaks

1958bc9

github-actions bot added the topic-html-repr label Oct 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve HTML reprs #10816

Improve HTML reprs #10816

shoyer commented Oct 5, 2025 •

edited

Loading

Uh oh!

jsignell commented Oct 9, 2025

Uh oh!

shoyer commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Improve HTML reprs #10816

Are you sure you want to change the base?

Improve HTML reprs #10816

Conversation

shoyer commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Revised (this PR)

Baseline

Uh oh!

jsignell commented Oct 9, 2025

Uh oh!

shoyer commented Oct 9, 2025

Uh oh!

Uh oh!

shoyer commented Oct 5, 2025 •

edited

Loading