pandas-dev · jreback · Sep 10, 2014 · Sep 3, 2014
diff --git a/doc/source/computation.rst b/doc/source/computation.rst
@@ -413,6 +413,8 @@ columns using ``ix`` indexing:
    @savefig rolling_corr_pairwise_ex.png
    correls.ix[:, 'A', 'C'].plot()
 
+.. _stats.moments.expanding:
+
 Expanding window moment functions
 ---------------------------------
 A common alternative to rolling statistics is to use an *expanding* window,
@@ -485,60 +487,79 @@ relative impact of an individual data point. As an example, here is the
    @savefig expanding_mean_frame.png
    expanding_mean(ts).plot(style='k')
 
+.. _stats.moments.exponentially_weighted:
+
 Exponentially weighted moment functions
 ---------------------------------------
 
-A related set of functions are exponentially weighted versions of many of the
-above statistics. A number of EW (exponentially weighted) functions are
-provided using the blending method. For example, where :math:`y_t` is the
-result and :math:`x_t` the input, we compute an exponentially weighted moving
-average as
+A related set of functions are exponentially weighted versions of several of
+the above statistics. A number of expanding EW (exponentially weighted)
+functions are provided:
+
+.. csv-table::
+    :header: "Function", "Description"
+    :widths: 20, 80
+
+    ``ewma``, EW moving average
+    ``ewmvar``, EW moving variance
+    ``ewmstd``, EW moving standard deviation
+    ``ewmcorr``, EW moving correlation
+    ``ewmcov``, EW moving covariance
+
+In general, a weighted moving average is calculated as
 
 .. math::
 
-    y_t = (1 - \alpha) y_{t-1} + \alpha x_t
+    y_t = \frac{\sum_{i=0}^t w_i x_{t-i}}{\sum_{i=0}^t w_i},
 
-One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
-directly, it's easier to think about either the **span**, **center of mass
-(com)** or **halflife** of an EW moment:
+where :math:`x_t` is the input at :math:`y_t` is the result.
+
+The EW functions support two variants of exponential weights:
+The default, ``adjust=True``, uses the weights :math:`w_i = (1 - \alpha)^i`.
+When ``adjust=False`` is specified, moving averages are calculated as
 
 .. math::
 
-   \alpha =
-    \begin{cases}
-	\frac{2}{s + 1}, s = \text{span}\\
-	\frac{1}{1 + c}, c = \text{center of mass}\\
-	1 - \exp^{\frac{\log 0.5}{h}}, h = \text{half life}
+    y_0 &= x_0 \\
+    y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,
+
+which is equivalent to using weights
+
+.. math::
+
+    w_i = \begin{cases}
+        \alpha (1 - \alpha)^i & \text{if } i < t \\
+        (1 - \alpha)^i        & \text{if } i = t.
     \end{cases}
 
 .. note::
 
-  the equation above is sometimes written in the form
+   These equations are sometimes written in terms of :math:`\alpha' = 1 - \alpha`, e.g.
+
+   .. math::
 
-  .. math::
+      y_t = \alpha' y_{t-1} + (1 - \alpha') x_t.
 
-    y_t = \alpha' y_{t-1} + (1 - \alpha') x_t
+One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
+directly, it's easier to think about either the **span**, **center of mass
+(com)** or **halflife** of an EW moment:
 
-  where :math:`\alpha' = 1 - \alpha`.
+.. math::
 
-You can pass one of the three to these functions but not more. **Span**
+   \alpha =
+    \begin{cases}
+        \frac{2}{s + 1},               & s = \text{span}\\
+        \frac{1}{1 + c},               & c = \text{center of mass}\\
+        1 - \exp^{\frac{\log 0.5}{h}}, & h = \text{half life}
+    \end{cases}
+
+One must specify precisely one of the three to the EW functions. **Span**
 corresponds to what is commonly called a "20-day EW moving average" for
 example. **Center of mass** has a more physical interpretation. For example,
 **span** = 20 corresponds to **com** = 9.5. **Halflife** is the period of
-time for the exponential weight to reduce to one half. Here is the list of
-functions available:
-
-.. csv-table::
-    :header: "Function", "Description"
-    :widths: 20, 80
-
-    ``ewma``, EW moving average
-    ``ewmvar``, EW moving variance
-    ``ewmstd``, EW moving standard deviation
-    ``ewmcorr``, EW moving correlation
-    ``ewmcov``, EW moving covariance
+time for the exponential weight to reduce to one half.
 
-Here are an example for a univariate time series:
+Here is an example for a univariate time series:
 
 .. ipython:: python
 
@@ -548,8 +569,45 @@ Here are an example for a univariate time series:
    @savefig ewma_ex.png
    ewma(ts, span=20).plot(style='k')
 
-.. note::
+All the EW functions have a ``min_periods`` argument, which has the same
+meaning it does for all the ``expanding_`` and ``rolling_`` functions:
+no output values will be set until at least ``min_periods`` non-null values
+are encountered in the (expanding) window.
+(This is a change from versions prior to 0.15.0, in which the ``min_periods``
+argument affected only the ``min_periods`` consecutive entries starting at the
+first non-null value.)
+
+All the EW functions also have an ``ignore_na`` argument, which deterines how
+intermediate null values affect the calculation of the weights.
+When ``ignore_na=False`` (the default), weights are calculated based on absolute
+positions, so that intermediate null values affect the result.
+When ``ignore_na=True`` (which reproduces the behavior in versions prior to 0.15.0),
+weights are calculated by ignoring intermediate null values.
+For example, assuming ``adjust=True``, if ``ignore_na=False``, the weighted
+average of ``3, NaN, 5`` would be calculated as
+
+.. math::
+
+	\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}
+
+Whereas if ``ignore_na=True``, the weighted average would be calculated as
+
+.. math::
+
+	\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.
+
+The ``ewmvar``, ``ewmstd``, and ``ewmcov`` functions have a ``bias`` argument,
+specifying whether the result should contain biased or unbiased statistics.
+For example, if ``bias=True``, ``ewmvar(x)`` is calculated as
+``ewmvar(x) = ewma(x**2) - ewma(x)**2``;
+whereas if ``bias=False`` (the default), the biased variance statistics
+are scaled by debiasing factors
+
+.. math::
+
+    \frac{\left(\sum_{i=0}^t w_i\right)^2}{\left(\sum_{i=0}^t w_i\right)^2 - \sum_{i=0}^t w_i^2}.
 
-   The EW functions perform a standard adjustment to the initial observations
-   whereby if there are fewer observations than called for in the span, those
-   observations are reweighted accordingly.
+(For :math:`w_i = 1`, this reduces to the usual :math:`N / (N - 1)` factor,
+with :math:`N = t + 1`.)
+See http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance
+for further details.
diff --git a/doc/source/v0.15.0.txt b/doc/source/v0.15.0.txt
@@ -83,25 +83,8 @@ API changes
 
      rolling_min(s, window=10, min_periods=5)
 
-- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
-  now have an optional ``ignore_na`` argument.
-  When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
-  When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
-  (:issue:`7543`)
-
-  .. ipython:: python
-
-     ewma(Series([None, 1., 100.]), com=2.5)
-     ewma(Series([1., None, 100.]), com=2.5, ignore_na=True) # pre-0.15.0 behavior
-     ewma(Series([1., None, 100.]), com=2.5, ignore_na=False) # default
-
-- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
-  now set to ``NaN`` the first ``min_periods-1`` entries of the result (for ``min_periods>1``).
-  Previously the first ``min_periods`` entries of the result were set to ``NaN``.
-  The new behavior accords with the existing documentation. (:issue:`7884`)
-
 - :func:`rolling_max`, :func:`rolling_min`, :func:`rolling_sum`, :func:`rolling_mean`, :func:`rolling_median`,
-  :func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, and :func:`rolling_quantile`,
+  :func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, :func:`rolling_quantile`,
   :func:`rolling_cov`, :func:`rolling_corr`, :func:`rolling_corr_pairwise`,
   :func:`rolling_window`, and :func:`rolling_apply` with ``center=True`` previously would return a result of the same
   structure as the input ``arg`` with ``NaN`` in the final ``(window-1)/2`` entries.
@@ -112,27 +95,75 @@ API changes
 
   .. code-block:: python
 
-    In [7]: rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
+    In [7]: rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)
     Out[7]:
     0     1
     1     3
     2     6
-    3     9
-    4   NaN
+    3   NaN
     dtype: float64
-
-  New behavior (note final value is ``7 = sum([3, 4, NaN])``):
+  
+  New behavior (note final value is ``5 = sum([2, 3, NaN])``):
 
   .. ipython:: python
 
-    rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
+    rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)
 
 - Removed ``center`` argument from :func:`expanding_max`, :func:`expanding_min`, :func:`expanding_sum`,
   :func:`expanding_mean`, :func:`expanding_median`, :func:`expanding_std`, :func:`expanding_var`,
   :func:`expanding_skew`, :func:`expanding_kurt`, :func:`expanding_quantile`, :func:`expanding_count`,
   :func:`expanding_cov`, :func:`expanding_corr`, :func:`expanding_corr_pairwise`, and :func:`expanding_apply`,
   as the results produced when ``center=True`` did not make much sense. (:issue:`7925`)
 
+- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
+  now interpret ``min_periods`` in the same manner that the ``rolling_*`` and ``expanding_*`` functions do:
+  a given result entry will be ``NaN`` if the (expanding, in this case) window does not contain
+  at least ``min_periods`` values. The previous behavior was to set to ``NaN`` the ``min_periods`` entries
+  starting with the first non- ``NaN`` value. (:issue:`7977`)
+
+  Prior behavior (note values start at index ``2``, which is ``min_periods`` after index ``0``
+  (the index of the first non-empty value)):
+
+  .. ipython:: python
+
+    s  = Series([1, None, None, None, 2, 3])
+
+  .. code-block:: python
+
+	In [51]: ewma(s, com=3., min_periods=2)
+	Out[51]:
+	0         NaN
+	1         NaN
+	2    1.000000
+	3    1.000000
+	4    1.571429
+	5    2.189189
+	dtype: float64
+
+  New behavior (note values start at index ``4``, the location of the 2nd (since ``min_periods=2``) non-empty value):
+
+  .. ipython:: python
+
+    ewma(s, com=3., min_periods=2)
+
+- :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
+  now have an optional ``adjust`` argument, just like :func:`ewma` does,
+  affecting how the weights are calculated.
+  The default value of ``adjust`` is ``True``, which is backwards-compatible.
+  See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7911`)
+
+- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
+  now have an optional ``ignore_na`` argument.
+  When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
+  When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
+  (:issue:`7543`)
+
+  .. ipython:: python
+
+     ewma(Series([None, 1., 8.]), com=2.)
+     ewma(Series([1., None, 8.]), com=2., ignore_na=True)  # pre-0.15.0 behavior
+     ewma(Series([1., None, 8.]), com=2., ignore_na=False)  # new default
+
 - Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`)
 
   In prior versions this would drop the timezone.
@@ -580,12 +611,61 @@ Bug Fixes
 - Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
 - Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)
 - Bug in ``StataReader`` where strings were always converted to 244 characters-fixed width irrespective of underlying string size (:issue:`7858`)
-- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``
+
+- Bug in :func:`expanding_cov`, :func:`expanding_corr`, :func:`rolling_cov`, :func:`rolling_cor`, :func:`ewmcov`, and :func:`ewmcorr`
   returning results with columns sorted by name and producing an error for non-unique columns;
   now handles non-unique columns and returns columns in original order
   (except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`)
 - Bug in :func:`rolling_count` and ``expanding_*`` functions unnecessarily producing error message for zero-length data (:issue:`8056`)
 - Bug in :func:`rolling_apply` and :func:`expanding_apply` interpreting ``min_periods=0`` as ``min_periods=1`` (:issue:`8080`)
+- Bug in :func:`expanding_std` and :func:`expanding_var` for a single value producing a confusing error message (:issue:`7900`)
+- Bug in :func:`rolling_std` and :func:`rolling_var` for a single value producing ``0`` rather than ``NaN`` (:issue:`7900`)
+
+- Bug in :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, and :func:`ewmcov`
+  calculation of de-biasing factors when ``bias=False`` (the default).
+  Previously an incorrect constant factor was used, based on ``adjust=True``, ``ignore_na=True``,
+  and an infinite number of observations.
+  Now a different factor is used for each entry, based on the actual weights
+  (analogous to the usual ``N/(N-1)`` factor).
+  In particular, for a single point a value of ``NaN`` is returned when ``bias=False``,
+  whereas previously a value of (approximately) ``0`` was returned.
+
+  For example, consider the following pre-0.15.0 results for ``ewmvar(..., bias=False)``,
+  and the corresponding debiasing factors:
+
+  .. ipython:: python
+
+     s = Series([1., 2., 0., 4.])
+
+  .. code-block:: python
+
+	 In [69]: ewmvar(s, com=2., bias=False)
+	 Out[69]:
+	 0   -2.775558e-16
+	 1    3.000000e-01
+	 2    9.556787e-01
+	 3    3.585799e+00
+	 dtype: float64
+
+	 In [70]: ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
+	 Out[70]:
+	 0    1.25
+	 1    1.25
+	 2    1.25
+	 3    1.25
+	 dtype: float64
+
+  Note that entry ``0`` is approximately 0, and the debiasing factors are a constant 1.25.
+  By comparison, the following 0.15.0 results have a ``NaN`` for entry ``0``,
+  and the debiasing factors are decreasing (towards 1.25):
+
+  .. ipython:: python
+
+     ewmvar(s, com=2., bias=False)
+     ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
+
+  See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7912`)
+
 - Bug in ``DataFrame.plot`` and ``Series.plot`` may ignore ``rot`` and ``fontsize`` keywords (:issue:`7844`)