Introducing `ScalarChart` in place of `BarChart`? #5622

teh-cmc · 2024-03-21T14:20:17Z

Context

We have the Scalar archetype that makes it possible to log and visualize scalar timeseries (in the most literal sense: the X axis is literally the recording's clock):

for step in range(0, 64):
    rr.set_time_sequence("step", step)
    rr.log("scalar", rr.Scalar(math.sin(step / 10.0)))

This is good in that this allows users to just log their scalar data as it comes without having to manually keep track of any kind of state.

This is bad because it means each scalar has to be its own DataRow (1 row == 1 timepoint), which leads to performance issues if you just want to log a huge timeseries for which you already have all the data needed in one place.

These performance issues come in two forms:

On the client-side, logging will be very slow due to the cost of crafting and serializing DataRows for every scalar.
We do have a long-term plan that would allow users to log of "temporal batches", i.e. multiple timestamps worth of data in a single log call.
But A) it will be a while before this is implemented and B) it doesn't solve the second form of performance issue, discussed below.
On the viewer-side, ingestion will be very slow due to the cost of having to index many DataRows.
Indexing a row is a costly operation: it not only has to run all the datastore logic (indexing all the individual cells etc), but it also triggers a chain of events that need to propagate and update all downstream subscribers (datastore views, time panel, heuristics, clear cascades...).
AFAICT, batching row ingestion is much harder problem that batching on the logging side.
And even if we make it past the ingestion, rendering a time panel with a few million entries is probably still no cheap task (?).

Interestingly, we also have the BarChart archetype, which already has the nice property of accepting a batch of scalars all at once:

rr.log("bar_chart", rr.BarChart([8, 4, 0, 9, 1, 4, 1, 6, 9, 0]))

The one downside is that this doesn't integrate at all with the time cursor, since the barchart as a whole is its own entity.

But, for many cases, this can still be a very useful tool in real world scenarios, especially when combined with timeless/static.

Proposal

Retire the BarChart archetype in favor of a new generic ScalarChart archetype:

table ScalarChart {
  /// The values. Should always be a rank-1 tensor.
  values: rerun.components.TensorData ("attr.rerun.component_required", order: 1000);

  /// The optional indices. Should always be a rank-1 tensor with the same length as `values`.
  ///
  /// Defaults to `range(0, len(values))` if unspecified.
  indices: rerun.components.TensorData ("attr.rerun.component_optional", nullable, order: 3000);
}

rr.log("scalar_chart", rr.ScalarChart([1, 2, 3], indices=[30, 20, 10]))

As for styling, ScalarChart would re-use the same styling archetypes as Scalar: SeriesLine to visualize the data as a line, and SeriesPoint to visualize it as a scatter plot.

We would also introduce a new SeriesBar style:

/// Define the style properties for a bar series in a chart.
///
/// This archetype only provides styling information and should be logged as static
/// when possible. The underlying data needs to be logged to the same entity-path using
/// the `ScalarChart` archetype.
table SeriesBar {
    /// Color for the corresponding series.
    color: rerun.components.Color ("attr.rerun.component_optional", nullable, order: 1000);

    /// Bar width for the corresponding series.
    width: rerun.components.StrokeWidth ("attr.rerun.component_optional", nullable, order: 2000);

    /// Display name of the series.
    ///
    /// Used in the legend.
    name: rerun.components.Name ("attr.rerun.component_optional", nullable, order: 3000);
}

That new style would also retroactively work with the Scalar archetype.

All in all, this would improve the existing Scalar type by making it possible to visualize the data as a bar chart, and would allow users to work with very large series using the new ScalarChart.

Of course this has the same downside as the original BarChart archetype: it doesn't integrate with the time cursor.

As part of this work, we would also use this opportunity to share a lot more code between Scalar and ScalarChart, so that ScalarChart can benefit from all the recent improvements to the plot view (range caching, subpixel aggregation, etc).

Random thoughts

Unrelated to any of the above: maybe we should still allow batches of vanilla Scalars, if only so that people can at least batch their vanilla scalar data when they know they have more than a single value for a given timestamp? Sounds niche, but it is something that happens in the e.g. the VRS example 🤷

The text was updated successfully, but these errors were encountered:

jleibs · 2024-03-21T14:37:43Z

Related to the last random thought: Generic ScalarBatch is pretty common way of thinking about and representing large state-space control-systems. However, in that case the user would still want to be able to plot a specific Index sub-selection in a given plot.

Better yet, providing a mapping of those scalar-indices into the entity tree would provide the performance of single-timestamp batch signal-logging with convenient entity-path names.

For example:

# Signals is a scalar array with signals.shape = (18,)
rr.log("/signals/raw", ScalarBatch(signals))

# Somehow provide an API to remap:
/signals/imu/accel_x := /signals/raw[7]
/signals/imu/accel_y := /signals/raw[8]
/signals/imu/accel_z := /signals/raw[9]

Famok · 2024-04-18T12:14:02Z

Thanks for pointing me here @teh-cmc .
It would be great to be able to add a time and value vector at the same time.
I have unevenly sampled data (dt is not the same) and it seems to me, that the bar chart has even steps (like 0,1,2,3)

teh-cmc added 💬 discussion 🚀 performance Optimization, memory use, etc user-request This is a pressing issue for one of our users 🔩 data model labels Mar 21, 2024

This was referenced Apr 11, 2024

Perf and memory issues for >kHz time series data #5904

Closed

The 1M index entries problem #5967

Closed

teh-cmc mentioned this issue Apr 18, 2024

Function for adding multiple points to time series at once #6031

Closed

Wumpf mentioned this issue Sep 9, 2024

Enhance Support for Custom Axes and 2D Plotting #7346

Open

abey79 mentioned this issue Dec 3, 2024

Support arbitrary X axis in the plot view using a ScalarAbscissa #8286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing `ScalarChart` in place of `BarChart`? #5622

Introducing `ScalarChart` in place of `BarChart`? #5622

teh-cmc commented Mar 21, 2024

jleibs commented Mar 21, 2024

Famok commented Apr 18, 2024

Introducing ScalarChart in place of BarChart? #5622

Introducing ScalarChart in place of BarChart? #5622

Comments

teh-cmc commented Mar 21, 2024

Context

Proposal

Random thoughts

jleibs commented Mar 21, 2024

Famok commented Apr 18, 2024

Introducing `ScalarChart` in place of `BarChart`? #5622

Introducing `ScalarChart` in place of `BarChart`? #5622