-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add user guide for working with large time-series datasets #1302
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a few changes, please let me know if you disagree with some of them, they can always be reverted.
My main comment, which applies to pretty much every user guide that attempts to demonstrate Datashader, is that the file size ends up being quite large, close to 30MB in this case. If you don't have a super fast connection, you can see the page being loaded slowly (this one takes so long for me https://holoviews.org/user_guide/Large_Data.html!), and there's also some time spent rendering the page and its plots that have many data points. At the same time, most of these plots would deserve to be inspected with a live Python kernel to see the full benefit of the applied approach (LTTB, datashader).
Instead of displaying the real Bokeh plots, wouldn't it be better if we displayed pretty images or GIFs? (I don't want to block this PR if we think that's what we should do, that could be done in a 2nd iteration).
@@ -5,6 +5,8 @@ | |||
"id": "artificial-english", | |||
"metadata": {}, | |||
"source": [ | |||
"# Large Timeseries Data\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new set up the notebooks need to have a title.
@@ -166,22 +173,6 @@ | |||
"This makes LTTB an ideal default method for exploring timeseries datasets, particularly when the dataset size is unknown or too large for standard WebGL rendering." | |||
] | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really for documenting unreleased features. But something we could do would be to document this as if HoloViews 1.19.0 was already released. In that case, we should also update the code so it's already able to accept and pass the new values. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I included a note that these options are available starting in HoloViews 1.19.0. @hoxbro please be sure to include the tsdownsample PR in the next minor release. I don't think anything further needs to be updated in hvPlot code.
### Enhanced Downsampling Options
Starting in HoloViews version 1.19.0, integration with the [tsdownsample](https://github.com/predict-idlab/tsdownsample) library introduces enhanced downsampling functionality with the following methods, which will be accepted as inputs to `downsample` in hvPlot:
- **lttb**: Implements the Largest Triangle Three Buckets ([LTTB](https://github.com/predict-idlab/tsdownsample?tab=readme-ov-file#:~:text=performs%20the-,Largest%20Triangle%20Three%20Buckets,-algorithm)) algorithm, optimizing the selection of points to retain the visual shape of the data.
- **minmax**: For each segment of the data, this method retains the minimum and maximum values, ensuring that peaks and troughs are preserved.
- **minmax-lttb**: A hybrid approach that combines the minmax strategy with LTTB.
- **m4**: A [multi-step process](https://www.vldb.org/pvldb/vol7/p797-jugel.pdf) that leverages the min, max, first, and last values for each time segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything further needs to be updated in hvPlot code.
Not quite right I think, downsample
can just be a boolean to use downsample1d
, I don't think there is a way to customize how it's called from hvPlot at the moment (or if there is, it's not obvious and needs to be documented). I'll open an issue, but since multiple algorithms are documented here, it should be implemented before the release.
It's now showing up. |
We decided in a meeting to not block this PR.. it would be nice in the future to have some mechanism to optionally facilitate images in docs rather than big plots |
ready for merge? |
Yep, this sort of this is falling in our ever-growing bucket of nice-to-have-but-will-likely-never-happen! By the way, the Large Data guide in HoloViews does have a couple of GIFs, so adding GIFs to show what Datashader is capable of isn't unprecedented. I still think that would be the best thing to do to both make the page faster to load and better demonstrate hvPlot with Datashader, but as I don't really want to do it now, I'll just put that in the bucket too!
Yes! |
supersedes #1205
This adds a notebook that explains the different ways of working with large time-series datasets with holoviz