You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calling construct on a rolling window with a stride parameter results in unexpected behavior. After reading through some of the documentation, I now understand that rolling.constructmostly ignores the min_periods argument -- so it may be valid to say that this is expected output. That said, it is actually impossible to achieve the windowing result I desire using the current implementation of xarray without reconstructing the whole array in memory, which leads to wasteful memory usage, and in some cases, expansion of the array beyond my machine's memory capabilities.
Anything else we need to know?:
My use case here is for a strided FFT computation along timeseries data (spectrogram) which is one of the proposed use cases of the rolling.construct method. I desire a rolling constant-size window with strided overlap that sweeps across the data with no NaNs, resulting in any number of full windows (min_periods = None (window size) ). I can get close to my desired result with:
My synthesized data has a particular phase property that requires the first window consists of the first n samples of the data; basically I need that first sample. I can get my desired result by not using the strided keyword argument:
But this implementation inflates the representation of the data in memory and is very very slow for obvious reasons. On my actual timeseries dataset, this method causes a memory error after inflating 200 MB dataset to several tens of gigabytes.
There are obviously other ways to achieve this result, but none of them met my needs. One possibility would be to use groupby_bins, but that requires me to specify the number of bins I want along the axis. I actually care less about the number of bins, moreso that they're all the same size and have a consistent stride -- functionality that pointed me to rolling. I could also use rolling.reduce followed by a dropna, but that method requires me to rewrite my chunked analysis method to operate only on the raw data of the array, without access to coordinates associated with that chunk. I actually find the rolling.reduce methodology to be quite counterintuitive and would prefer a rolling.apply method instead, but that's a separate feature request.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-115-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: None
I think this is a good feature request. rolling pads with nans by default. We should allow users to turn that off or specify other padding options (see #2007, #3587)
@dcherian -- Very good call. Simply being able to turn off NaN padding would permit my use case. Would it make more sense to close this bug report and submit a feature request with that functionality? I can submit that if so :)
Minimal Complete Verifiable Example: Consider the following -
What you expected to happen:
What happened: Output:
Calling construct on a rolling window with a stride parameter results in unexpected behavior. After reading through some of the documentation, I now understand that
rolling.construct
mostly ignores themin_periods
argument -- so it may be valid to say that this is expected output. That said, it is actually impossible to achieve the windowing result I desire using the current implementation of xarray without reconstructing the whole array in memory, which leads to wasteful memory usage, and in some cases, expansion of the array beyond my machine's memory capabilities.Anything else we need to know?:
My use case here is for a strided FFT computation along timeseries data (spectrogram) which is one of the proposed use cases of the
rolling.construct
method. I desire a rolling constant-size window with strided overlap that sweeps across the data with no NaNs, resulting in any number of full windows (min_periods
= None (window size) ). I can get close to my desired result with:but this gives me the (expected) result:
Note how the first sample of the data is dropped
My synthesized data has a particular phase property that requires the first window consists of the first n samples of the data; basically I need that first sample. I can get my desired result by not using the strided keyword argument:
But this implementation inflates the representation of the data in memory and is very very slow for obvious reasons. On my actual timeseries dataset, this method causes a memory error after inflating 200 MB dataset to several tens of gigabytes.
There are obviously other ways to achieve this result, but none of them met my needs. One possibility would be to use
groupby_bins
, but that requires me to specify the number of bins I want along the axis. I actually care less about the number of bins, moreso that they're all the same size and have a consistent stride -- functionality that pointed me torolling
. I could also userolling.reduce
followed by adropna
, but that method requires me to rewrite my chunked analysis method to operate only on the raw data of the array, without access to coordinates associated with that chunk. I actually find therolling.reduce
methodology to be quite counterintuitive and would prefer arolling.apply
method instead, but that's a separate feature request.Environment:
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-115-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: Nonexarray: 0.15.1
pandas: 1.0.5
numpy: 1.18.5
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.19.0
distributed: None
matplotlib: 3.2.2
cartopy: None
seaborn: 0.10.1
numbagg: None
setuptools: 47.3.1.post20200616
pip: 20.1.1
conda: 4.8.3
pytest: None
IPython: 7.15.0
sphinx: 3.1.1
The text was updated successfully, but these errors were encountered: