-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for M4 thinning of timeseries #350
Comments
Thanks for bringing this up @mattijn, I've thought a little bit about this, and I think there's a path to supporting it. There are two subtleties (not blockers) that come up that I've thought of so far:
In terms of implementation, it think what we could do is to implement a new Vega transform named But actually, come to think of it, the width wouldn't need to be fixed in the case of the widget renderer since the m4 process could re-run on resize. |
I think there is another use-case. Next to the It would be great if it can be applied to this use-case as well. For example this spec: import numpy as np
import pandas as pd
import altair as alt
from vega_datasets import data
source = data.sp500()
# full range of available dates: year 2000 to 2010
print('full range:',source.date.min(), source.date.max())
# start chart with x-axis set to last period
# `value` range as datetime in milliseconds since unix epoch
x_init = pd.to_datetime(['2008-12-01', '2010-04-01']).astype(np.int64) / 1E6
interval = alt.selection_interval(encodings=['x'], bind='scales', value={"date":list(x_init)})
title=alt.Title(text=alt.expr(f'''
"FROM " + timeFormat({interval.name}["date"][0], "%B %d, %Y") +
" TO " + timeFormat({interval.name}["date"][1], "%B %d, %Y")
'''), subtitle='range in view')
# zoom and pan
alt.Chart(source, title=title).mark_line().encode(
x='date:T',
y='price:Q'
).properties(
width=600,
height=200
).add_params(interval) The full range of available datetime is: For this dataset the time interval is 1 month, so 'thinning' or |
Yeah, I think interactive m4 when using the widget renderer should be possible. Agreed that this would be a really great optimization for timeseries line/area visualizations! |
I've been playing a bit with this in combination with duckdb and JupyterChart, no vegafusion included yet. See video: Screen.Recording.2023-10-22.at.23.50.30.mp4I only could it yet to work with two charts side by side, where the chart-data on the right is updated based on the interactivity in the chart on the left. The source data frame contains 50M rows, promising results! For reference, here the notebook how it was done: https://gist.github.com/mattijn/ac749df17bd5ed9c6bdec621f90096b3#file-altair-2023-10-22-am4-thinning-ipynb |
This is really cool @mattijn! I hadn't thought of the idea of binding a width param, that's great that this works with JupyterChart. I'd love to have VegaFusion do this automatically some day, but in the end it would pretty much be the logic you've implemented by hand here. Thinking that this approach could also be used with datashader for other visualization types. In this case the updated mark would be a base64 encoded image. |
I saw this page: https://observablehq.com/@uwdata/m4-scalable-time-series-visualization and saw there is mentioned a SQL friendly version of M4 (I'm more familiar with the term 'thinning' of timeseries).
In Mosaic it is implemented around here in JavaScript, https://github.com/uwdata/mosaic/blob/7eb1ddaae512068fd6bb6cb42e594ef0ec2b1a1c/packages/vgplot/src/marks/ConnectedMark.js#L41-L63 with a reference to this paper https://arxiv.org/pdf/2306.03714.pdf, which also mentions this query.
Is this something that is of interest to VegaFusion in relation to eg the DuckDB engine? It applies to charts (at least line/area) without an aggregation applied.
The text was updated successfully, but these errors were encountered: