Skip to content

API/BUG/ENH: ewmvar/cov debiasing factors; add 'adjust' to ewmvar/std/vol/cov/corr; ewm*() min_periods #7926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 10, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 94 additions & 36 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,8 @@ columns using ``ix`` indexing:
@savefig rolling_corr_pairwise_ex.png
correls.ix[:, 'A', 'C'].plot()

.. _stats.moments.expanding:

Expanding window moment functions
---------------------------------
A common alternative to rolling statistics is to use an *expanding* window,
Expand Down Expand Up @@ -485,60 +487,79 @@ relative impact of an individual data point. As an example, here is the
@savefig expanding_mean_frame.png
expanding_mean(ts).plot(style='k')

.. _stats.moments.exponentially_weighted:

Exponentially weighted moment functions
---------------------------------------

A related set of functions are exponentially weighted versions of many of the
above statistics. A number of EW (exponentially weighted) functions are
provided using the blending method. For example, where :math:`y_t` is the
result and :math:`x_t` the input, we compute an exponentially weighted moving
average as
A related set of functions are exponentially weighted versions of several of
the above statistics. A number of expanding EW (exponentially weighted)
functions are provided:

.. csv-table::
:header: "Function", "Description"
:widths: 20, 80

``ewma``, EW moving average
``ewmvar``, EW moving variance
``ewmstd``, EW moving standard deviation
``ewmcorr``, EW moving correlation
``ewmcov``, EW moving covariance

In general, a weighted moving average is calculated as

.. math::

y_t = (1 - \alpha) y_{t-1} + \alpha x_t
y_t = \frac{\sum_{i=0}^t w_i x_{t-i}}{\sum_{i=0}^t w_i},

One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
directly, it's easier to think about either the **span**, **center of mass
(com)** or **halflife** of an EW moment:
where :math:`x_t` is the input at :math:`y_t` is the result.

The EW functions support two variants of exponential weights:
The default, ``adjust=True``, uses the weights :math:`w_i = (1 - \alpha)^i`.
When ``adjust=False`` is specified, moving averages are calculated as

.. math::

\alpha =
\begin{cases}
\frac{2}{s + 1}, s = \text{span}\\
\frac{1}{1 + c}, c = \text{center of mass}\\
1 - \exp^{\frac{\log 0.5}{h}}, h = \text{half life}
y_0 &= x_0 \\
y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,

which is equivalent to using weights

.. math::

w_i = \begin{cases}
\alpha (1 - \alpha)^i & \text{if } i < t \\
(1 - \alpha)^i & \text{if } i = t.
\end{cases}

.. note::

the equation above is sometimes written in the form
These equations are sometimes written in terms of :math:`\alpha' = 1 - \alpha`, e.g.

.. math::

.. math::
y_t = \alpha' y_{t-1} + (1 - \alpha') x_t.

y_t = \alpha' y_{t-1} + (1 - \alpha') x_t
One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
directly, it's easier to think about either the **span**, **center of mass
(com)** or **halflife** of an EW moment:

where :math:`\alpha' = 1 - \alpha`.
.. math::

You can pass one of the three to these functions but not more. **Span**
\alpha =
\begin{cases}
\frac{2}{s + 1}, & s = \text{span}\\
\frac{1}{1 + c}, & c = \text{center of mass}\\
1 - \exp^{\frac{\log 0.5}{h}}, & h = \text{half life}
\end{cases}

One must specify precisely one of the three to the EW functions. **Span**
corresponds to what is commonly called a "20-day EW moving average" for
example. **Center of mass** has a more physical interpretation. For example,
**span** = 20 corresponds to **com** = 9.5. **Halflife** is the period of
time for the exponential weight to reduce to one half. Here is the list of
functions available:

.. csv-table::
:header: "Function", "Description"
:widths: 20, 80

``ewma``, EW moving average
``ewmvar``, EW moving variance
``ewmstd``, EW moving standard deviation
``ewmcorr``, EW moving correlation
``ewmcov``, EW moving covariance
time for the exponential weight to reduce to one half.

Here are an example for a univariate time series:
Here is an example for a univariate time series:

.. ipython:: python

Expand All @@ -548,8 +569,45 @@ Here are an example for a univariate time series:
@savefig ewma_ex.png
ewma(ts, span=20).plot(style='k')

.. note::
All the EW functions have a ``min_periods`` argument, which has the same
meaning it does for all the ``expanding_`` and ``rolling_`` functions:
no output values will be set until at least ``min_periods`` non-null values
are encountered in the (expanding) window.
(This is a change from versions prior to 0.15.0, in which the ``min_periods``
argument affected only the ``min_periods`` consecutive entries starting at the
first non-null value.)

All the EW functions also have an ``ignore_na`` argument, which deterines how
intermediate null values affect the calculation of the weights.
When ``ignore_na=False`` (the default), weights are calculated based on absolute
positions, so that intermediate null values affect the result.
When ``ignore_na=True`` (which reproduces the behavior in versions prior to 0.15.0),
weights are calculated by ignoring intermediate null values.
For example, assuming ``adjust=True``, if ``ignore_na=False``, the weighted
average of ``3, NaN, 5`` would be calculated as

.. math::

\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}

Whereas if ``ignore_na=True``, the weighted average would be calculated as

.. math::

\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.

The ``ewmvar``, ``ewmstd``, and ``ewmcov`` functions have a ``bias`` argument,
specifying whether the result should contain biased or unbiased statistics.
For example, if ``bias=True``, ``ewmvar(x)`` is calculated as
``ewmvar(x) = ewma(x**2) - ewma(x)**2``;
whereas if ``bias=False`` (the default), the biased variance statistics
are scaled by debiasing factors

.. math::

\frac{\left(\sum_{i=0}^t w_i\right)^2}{\left(\sum_{i=0}^t w_i\right)^2 - \sum_{i=0}^t w_i^2}.

The EW functions perform a standard adjustment to the initial observations
whereby if there are fewer observations than called for in the span, those
observations are reweighted accordingly.
(For :math:`w_i = 1`, this reduces to the usual :math:`N / (N - 1)` factor,
with :math:`N = t + 1`.)
See http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance
for further details.
130 changes: 105 additions & 25 deletions doc/source/v0.15.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,25 +83,8 @@ API changes

rolling_min(s, window=10, min_periods=5)

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
now have an optional ``ignore_na`` argument.
When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
(:issue:`7543`)

.. ipython:: python

ewma(Series([None, 1., 100.]), com=2.5)
ewma(Series([1., None, 100.]), com=2.5, ignore_na=True) # pre-0.15.0 behavior
ewma(Series([1., None, 100.]), com=2.5, ignore_na=False) # default

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
now set to ``NaN`` the first ``min_periods-1`` entries of the result (for ``min_periods>1``).
Previously the first ``min_periods`` entries of the result were set to ``NaN``.
The new behavior accords with the existing documentation. (:issue:`7884`)

- :func:`rolling_max`, :func:`rolling_min`, :func:`rolling_sum`, :func:`rolling_mean`, :func:`rolling_median`,
:func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, and :func:`rolling_quantile`,
:func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, :func:`rolling_quantile`,
:func:`rolling_cov`, :func:`rolling_corr`, :func:`rolling_corr_pairwise`,
:func:`rolling_window`, and :func:`rolling_apply` with ``center=True`` previously would return a result of the same
structure as the input ``arg`` with ``NaN`` in the final ``(window-1)/2`` entries.
Expand All @@ -112,27 +95,75 @@ API changes

.. code-block:: python

In [7]: rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
In [7]: rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)
Out[7]:
0 1
1 3
2 6
3 9
4 NaN
3 NaN
dtype: float64

New behavior (note final value is ``7 = sum([3, 4, NaN])``):
New behavior (note final value is ``5 = sum([2, 3, NaN])``):

.. ipython:: python

rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)

- Removed ``center`` argument from :func:`expanding_max`, :func:`expanding_min`, :func:`expanding_sum`,
:func:`expanding_mean`, :func:`expanding_median`, :func:`expanding_std`, :func:`expanding_var`,
:func:`expanding_skew`, :func:`expanding_kurt`, :func:`expanding_quantile`, :func:`expanding_count`,
:func:`expanding_cov`, :func:`expanding_corr`, :func:`expanding_corr_pairwise`, and :func:`expanding_apply`,
as the results produced when ``center=True`` did not make much sense. (:issue:`7925`)

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
now interpret ``min_periods`` in the same manner that the ``rolling_*`` and ``expanding_*`` functions do:
a given result entry will be ``NaN`` if the (expanding, in this case) window does not contain
at least ``min_periods`` values. The previous behavior was to set to ``NaN`` the ``min_periods`` entries
starting with the first non- ``NaN`` value. (:issue:`7977`)

Prior behavior (note values start at index ``2``, which is ``min_periods`` after index ``0``
(the index of the first non-empty value)):

.. ipython:: python

s = Series([1, None, None, None, 2, 3])

.. code-block:: python

In [51]: ewma(s, com=3., min_periods=2)
Out[51]:
0 NaN
1 NaN
2 1.000000
3 1.000000
4 1.571429
5 2.189189
dtype: float64

New behavior (note values start at index ``4``, the location of the 2nd (since ``min_periods=2``) non-empty value):

.. ipython:: python

ewma(s, com=3., min_periods=2)

- :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
now have an optional ``adjust`` argument, just like :func:`ewma` does,
affecting how the weights are calculated.
The default value of ``adjust`` is ``True``, which is backwards-compatible.
See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7911`)

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
now have an optional ``ignore_na`` argument.
When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
(:issue:`7543`)

.. ipython:: python

ewma(Series([None, 1., 8.]), com=2.)
ewma(Series([1., None, 8.]), com=2., ignore_na=True) # pre-0.15.0 behavior
ewma(Series([1., None, 8.]), com=2., ignore_na=False) # new default

- Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`)

In prior versions this would drop the timezone.
Expand Down Expand Up @@ -580,12 +611,61 @@ Bug Fixes
- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)
- Bug in ``StataReader`` where strings were always converted to 244 characters-fixed width irrespective of underlying string size (:issue:`7858`)
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``

- Bug in :func:`expanding_cov`, :func:`expanding_corr`, :func:`rolling_cov`, :func:`rolling_cor`, :func:`ewmcov`, and :func:`ewmcorr`
returning results with columns sorted by name and producing an error for non-unique columns;
now handles non-unique columns and returns columns in original order
(except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`)
- Bug in :func:`rolling_count` and ``expanding_*`` functions unnecessarily producing error message for zero-length data (:issue:`8056`)
- Bug in :func:`rolling_apply` and :func:`expanding_apply` interpreting ``min_periods=0`` as ``min_periods=1`` (:issue:`8080`)
- Bug in :func:`expanding_std` and :func:`expanding_var` for a single value producing a confusing error message (:issue:`7900`)
- Bug in :func:`rolling_std` and :func:`rolling_var` for a single value producing ``0`` rather than ``NaN`` (:issue:`7900`)

- Bug in :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, and :func:`ewmcov`
calculation of de-biasing factors when ``bias=False`` (the default).
Previously an incorrect constant factor was used, based on ``adjust=True``, ``ignore_na=True``,
and an infinite number of observations.
Now a different factor is used for each entry, based on the actual weights
(analogous to the usual ``N/(N-1)`` factor).
In particular, for a single point a value of ``NaN`` is returned when ``bias=False``,
whereas previously a value of (approximately) ``0`` was returned.

For example, consider the following pre-0.15.0 results for ``ewmvar(..., bias=False)``,
and the corresponding debiasing factors:

.. ipython:: python

s = Series([1., 2., 0., 4.])

.. code-block:: python

In [69]: ewmvar(s, com=2., bias=False)
Out[69]:
0 -2.775558e-16
1 3.000000e-01
2 9.556787e-01
3 3.585799e+00
dtype: float64

In [70]: ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
Out[70]:
0 1.25
1 1.25
2 1.25
3 1.25
dtype: float64

Note that entry ``0`` is approximately 0, and the debiasing factors are a constant 1.25.
By comparison, the following 0.15.0 results have a ``NaN`` for entry ``0``,
and the debiasing factors are decreasing (towards 1.25):

.. ipython:: python

ewmvar(s, com=2., bias=False)
ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)

See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7912`)

- Bug in ``DataFrame.plot`` and ``Series.plot`` may ignore ``rot`` and ``fontsize`` keywords (:issue:`7844`)


Expand Down
Loading