Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔥 delegate nan behavior to aggregators #294

Merged
merged 6 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,46 @@
# `TODO`
## New Features

## What's Changed
- Removed the `check_nans` argument of the FigureResampler constructor and its `add_traces` method. This argument was used to check for NaNs in the input data, but this is now handled by the `nan_policy` argument of specific aggregators (see for instance the constructor of the `MinMax` and `MinMaxLTTB` aggregator).


# v0.9.2
### ⚡ `overview` / `rangeslider` support 🎉

* ➡️ [code example](https://github.com/predict-idlab/plotly-resampler/blob/main/examples/dash_apps/05_cache_overview_subplots.py):
* 🖍️ [high level docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/getting_started/#overview)
* 🔍 [API docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/api/figure_resampler/figure_resampler/#figure_resampler.figure_resampler.FigureResampler.__init__)
* make sure to take a look at the doc strings of the `create_overview`, `overview_row_idxs`, and `overview_kwargs` arguments of the `FigureResampler` its constructor.
![Peek 2023-10-25 01-51](https://github.com/predict-idlab/plotly-resampler/assets/38005924/5b3a40e0-f058-4d7e-8303-47e51896347a)



### 💨 remove [traceUpdater](https://github.com/predict-idlab/trace-updater) dash component as a dependency.
> **context**: see #281 #271
> `traceUpdater` was developed during a period when Dash did not yet contain the [Patch ](https://dash.plotly.com/partial-properties)feature for partial property updates. As such, `traceUpdater` has become somewhat redundant is now effectively replaced with Patch.

🚨 This is a breaking change with previous `Dash` apps!!!

## What's Changed
* Support nested admonitions by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/245
* 👷 build: create codeql.yml by @NielsPraet in https://github.com/predict-idlab/plotly-resampler/pull/248
* :sparkles: first draft of improved xaxis filtering by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/250
* :arrow_up: update dependencies by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/260
* :muscle: update dash-extensions by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/261
* fix for #263 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/264
* Rangeslider support by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/254
* :pray: fix mkdocs by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/268
* ✈️ fix for #270 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/272
* :mag: adding init kwargs to show dash - fix for #265 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/269
* Refactor/remove trace updater by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/281
* Bug/pop rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/279
* :sparkles: fix for #275 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/286
* Bug/rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/287


**Full Changelog**: https://github.com/predict-idlab/plotly-resampler/compare/v0.9.1...v0.9.2


# v0.9.1
## Major changes:
Expand Down
27 changes: 23 additions & 4 deletions plotly_resampler/aggregation/aggregators.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
LTTBDownsampler,
MinMaxDownsampler,
MinMaxLTTBDownsampler,
NaNMinMaxDownsampler,
NaNMinMaxLTTBDownsampler,
)

from ..aggregation.aggregation_interface import DataAggregator, DataPointSelector
Expand Down Expand Up @@ -171,18 +173,25 @@ class MinMaxAggregator(DataPointSelector):

"""

def __init__(self, **downsample_kwargs):
def __init__(self, nan_policy="omit", **downsample_kwargs):
"""
Parameters
----------
**downsample_kwargs
Keyword arguments passed to the :class:`MinMaxDownsampler`.
- The `parallel` argument is set to False by default.
nan_policy: str, optional
The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.

"""
# this downsampler supports all dtypes
super().__init__(**downsample_kwargs)
self.downsampler = MinMaxDownsampler()
if nan_policy not in ("omit", "keep"):
raise ValueError("nan_policy must be either 'omit' or 'keep'")
if nan_policy == "omit":
self.downsampler = MinMaxDownsampler()
else:
self.downsampler = NaNMinMaxDownsampler()

def _arg_downsample(
self,
Expand All @@ -208,21 +217,31 @@ class MinMaxLTTB(DataPointSelector):
Paper: [https://arxiv.org/pdf/2305.00332.pdf](https://arxiv.org/pdf/2305.00332.pdf)
"""

def __init__(self, minmax_ratio: int = 4, **downsample_kwargs):
def __init__(
self, minmax_ratio: int = 4, nan_policy: str = "omit", **downsample_kwargs
):
"""
Parameters
----------
minmax_ratio: int, optional
The ratio between the number of data points in the MinMax-prefetching and
the number of data points that will be outputted by LTTB. By default, 4.
nan_policy: str, optional
The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.
**downsample_kwargs
Keyword arguments passed to the `MinMaxLTTBDownsampler`.
- The `parallel` argument is set to False by default.
- The `minmax_ratio` argument is set to 4 by default, which was empirically
proven to be a good default.

"""
self.minmaxlttb = MinMaxLTTBDownsampler()
if nan_policy not in ("omit", "keep"):
raise ValueError("nan_policy must be either 'omit' or 'keep'")
if nan_policy == "omit":
self.minmaxlttb = MinMaxLTTBDownsampler()
else:
self.minmaxlttb = NaNMinMaxLTTBDownsampler()

self.minmax_ratio = minmax_ratio

super().__init__(
Expand Down
58 changes: 7 additions & 51 deletions plotly_resampler/figure_resampler/figure_resampler_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,6 @@ def _parse_get_trace_props(
hf_hovertext: Iterable = None,
hf_marker_size: Iterable = None,
hf_marker_color: Iterable = None,
check_nans: bool = True,
) -> _hf_data_container:
"""Parse and capture the possibly high-frequency trace-props in a datacontainer.

Expand All @@ -572,11 +571,6 @@ def _parse_get_trace_props(
hf_hovertext : Iterable, optional
High-frequency trace "hovertext" data, overrides the current trace its
hovertext data.
check_nans: bool, optional
Whether the `hf_y` should be checked for NaNs, by default True.
As checking for NaNs is expensive, this can be disabled when the `hf_y` is
already known to contain no NaNs (or when the downsampler can handle NaNs,
e.g., EveryNthPoint).

Returns
-------
Expand Down Expand Up @@ -654,7 +648,8 @@ def _parse_get_trace_props(
if hf_y.ndim != 0: # if hf_y is an array
hf_x = pd.RangeIndex(0, len(hf_y)) # np.arange(len(hf_y))
else: # if no data as y or hf_y is passed
hf_x = np.asarray(None)
hf_x = np.asarray([])
hf_y = np.asarray([])

assert hf_y.ndim == np.ndim(hf_x), (
"plotly-resampler requires scatter data "
Expand All @@ -677,22 +672,6 @@ def _parse_get_trace_props(
if isinstance(hf_marker_color, (tuple, list, np.ndarray, pd.Series)):
hf_marker_color = np.asarray(hf_marker_color)

# Remove NaNs for efficiency (storing less meaningless data)
# NaNs introduce gaps between enclosing non-NaN data points & might distort
# the resampling algorithms
if check_nans and pd.isna(hf_y).any():
not_nan_mask = ~pd.isna(hf_y)
hf_x = hf_x[not_nan_mask]
hf_y = hf_y[not_nan_mask]
if isinstance(hf_text, np.ndarray):
hf_text = hf_text[not_nan_mask]
if isinstance(hf_hovertext, np.ndarray):
hf_hovertext = hf_hovertext[not_nan_mask]
if isinstance(hf_marker_size, np.ndarray):
hf_marker_size = hf_marker_size[not_nan_mask]
if isinstance(hf_marker_color, np.ndarray):
hf_marker_color = hf_marker_color[not_nan_mask]

# Try to parse the hf_x data if it is of object type or
if len(hf_x) and (hf_x.dtype.type is np.str_ or hf_x.dtype == "object"):
try:
Expand Down Expand Up @@ -876,7 +855,6 @@ def add_trace(
hf_hovertext: Union[str, Iterable] = None,
hf_marker_size: Union[str, Iterable] = None,
hf_marker_color: Union[str, Iterable] = None,
check_nans: bool = True,
**trace_kwargs,
):
"""Add a trace to the figure.
Expand Down Expand Up @@ -932,13 +910,6 @@ def add_trace(
hf_marker_color: Iterable, optional
The original high frequency marker color. If set, this has priority over the
trace its ``marker.color`` argument.
check_nans: boolean, optional
If set to True, the trace's data will be checked for NaNs - which will be
removed. By default True.
As this is a costly operation, it is recommended to set this parameter to
False if you are sure that your data does not contain NaNs (or when the
downsampler can handle NaNs, e.g., EveryNthPoint). This should considerably
speed up the graph construction time.
**trace_kwargs: dict
Additional trace related keyword arguments.
e.g.: row=.., col=..., secondary_y=...
Expand Down Expand Up @@ -1019,7 +990,6 @@ def add_trace(
hf_hovertext,
hf_marker_size,
hf_marker_color,
check_nans,
)

# These traces will determine the autoscale its RANGE!
Expand Down Expand Up @@ -1078,7 +1048,6 @@ def add_traces(
downsamplers: None | List[AbstractAggregator] | AbstractAggregator = None,
gap_handlers: None | List[AbstractGapHandler] | AbstractGapHandler = None,
limit_to_views: List[bool] | bool = False,
check_nans: List[bool] | bool = True,
**traces_kwargs,
):
"""Add traces to the figure.
Expand Down Expand Up @@ -1124,14 +1093,6 @@ def add_traces(
by default False.\n
Remark that setting this parameter to True ensures that low frequency traces
are added to the ``hf_data`` property.
check_nans : None | List[bool] | bool, optional
List of check_nans booleans for the added traces. If set to True, the
trace's datapoints will be checked for NaNs. If a single boolean is passed,
all to be added traces will use this value, by default True.\n
As this is a costly operation, it is recommended to set this parameter to
False if the data is known to contain no NaNs (or when the downsampler can
handle NaNs, e.g., EveryNthPoint). This will considerably speed up the graph
construction time.
**traces_kwargs: dict
Additional trace related keyword arguments.
e.g.: rows=.., cols=..., secondary_ys=...
Expand Down Expand Up @@ -1174,16 +1135,11 @@ def add_traces(
gap_handlers = [gap_handlers] * len(data)
if isinstance(limit_to_views, bool):
limit_to_views = [limit_to_views] * len(data)
if isinstance(check_nans, bool):
check_nans = [check_nans] * len(data)

zipped = zip(
data, max_n_samples, downsamplers, gap_handlers, limit_to_views, check_nans
)
for (
i,
(trace, max_out, downsampler, gap_handler, limit_to_view, check_nan),
) in enumerate(zipped):
zipped = zip(data, max_n_samples, downsamplers, gap_handlers, limit_to_views)
for (i, (trace, max_out, downsampler, gap_handler, limit_to_view)) in enumerate(
zipped
):
if (
trace.type.lower() not in self._high_frequency_traces
or self._hf_data.get(trace.uid) is not None
Expand All @@ -1194,7 +1150,7 @@ def add_traces(
if not limit_to_view and (trace.y is None or len(trace.y) <= max_out_s):
continue

dc = self._parse_get_trace_props(trace, check_nans=check_nan)
dc = self._parse_get_trace_props(trace)
self._hf_data[trace.uid] = self._construct_hf_data_dict(
dc,
trace=trace,
Expand Down
Loading
Loading