Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving average shifted #20667

Closed
l8liu opened this issue Sep 27, 2016 · 14 comments
Closed

Moving average shifted #20667

l8liu opened this issue Sep 27, 2016 · 14 comments

Comments

@l8liu
Copy link

l8liu commented Sep 27, 2016

I 'm using Elasticsearch 'Moving Average' in the pipeline aggregation to compute the moving average of time series data, but I found the moving average is shifted one day . For example, when I set window = 3, i.e., 3 days moving average, I expect the moving average equals the average of the current day and previous 2 days, but the results are the average of previous 3 days, and not including the current day. The first bucket doesn't include any moving average value because of this.

Also, could you please consider add a 'Moving Standard deviation' in the future? It will be helpful for the statistical process control.

Here are my codes and results.

GET /tweets-classified*/_search
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "(syndromes.GI: 2) AND (place.country_code: CA)"
        }
      },
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "2016-01-01"

          }
        }
      }
    }
  },
   "aggs": {
      "daily_count": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "sum": {
                  "field": "syndromes.GI"
             }
          },
            "mvavg": {
                "moving_avg": {
                  "buckets_path": "the_sum",
                  "window": 3,
                  "model": "simple"
                }
            }
        }
    }
  }
}
{
  "took": 6470,
  "timed_out": false,
  "_shards": {
    "total": 96,
    "successful": 96,
    "failed": 0
  },
  "hits": {
    "total": 1940,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "daily_count": {
      "buckets": [
        {
          "key_as_string": "2016-01-01T00:00:00.000Z",
          "key": 1451606400000,
          "doc_count": 10,
          "the_sum": {
            "value": 20
          }
        },
        {
          "key_as_string": "2016-01-02T00:00:00.000Z",
          "key": 1451692800000,
          "doc_count": 9,
          "the_sum": {
            "value": 18
          },
          "mvavg": {
            "value": 20
          }
        },
        {
          "key_as_string": "2016-01-03T00:00:00.000Z",
          "key": 1451779200000,
          "doc_count": 5,
          "the_sum": {
            "value": 10
          },
          "mvavg": {
            "value": 19
          }
        },
        {
          "key_as_string": "2016-01-04T00:00:00.000Z",
          "key": 1451865600000,
          "doc_count": 9,
          "the_sum": {
            "value": 18
          },
          "mvavg": {
            "value": 16
          }
        },
        {
          "key_as_string": "2016-01-05T00:00:00.000Z",
          "key": 1451952000000,
          "doc_count": 12,
          "the_sum": {
            "value": 24
          },
          "mvavg": {
            "value": 15.333333333333334
          }
        },
        {
          "key_as_string": "2016-01-06T00:00:00.000Z",
          "key": 1452038400000,
          "doc_count": 10,
          "the_sum": {
            "value": 20
          },
          "mvavg": {
            "value": 17.333333333333332
          }
        },

.......

@clintongormley
Copy link
Contributor

@polyfractal could you take a look at this please?

@polyfractal
Copy link
Contributor

Hey @l8liu, sorry for the delay in responding. We've been chatting internally about this a bit. I'll start with the easy one:

Also, could you please consider add a 'Moving Standard deviation' in the future?

Definitely! We've talked about this a few times and I know I'd personally like one. We're currently trying to determine the best way to add it (dedicated agg, or rename moving_average to moving_function and implement std dev as a "function, etc). I think it'll end up as a dedicated moving_stddev agg since that's the simplest.

I expect the moving average equals the average of the current day and previous 2 days, but the results are the average of previous 3 days, and not including the current day.

So we've been chatting about why we did this, and I think the behavior is "correct" because we are using a "left" moving average commonly used in financial and other non-signal use-cases (e.g. the value of today is the result of previous values). For example, financial moving averages use closing prices, and today's value is intrinsically incalculable because today hasn't "closed" yet. Which means that you can only calculate moving averages on previous data points, not today's. This mirrors wiki's definition for a financial SMA:

In financial applications a simple moving average (SMA) is the unweighted mean of the previous n data
For a number of applications, it is advantageous to avoid the shifting induced by using only 'past' data. Hence a central moving average can be computed, using data equally spaced on either side of the point in the series where the mean is calculated

Also, if today's values were included, the value of today's movavg would change each time you execute the aggregation since the bucket continues "filling".

This is quite a bit different from more science/engineering/signal processing use-cases where you take "centered" moving averages (the movavg of a point is the average of n/2 on each side) in which case it makes sense to include the bucket in it's own movavg. But since we only support one orientation at the moment, I think the current behavior is "correct".

Lots of "scare quotes" above, because I agree it's more a matter of perspective and how you want to use the moving average. We still haven't come to a final decision, but I think so far we've at least agreed the current implementation isn't wrong per se. :)

@l8liu
Copy link
Author

l8liu commented Oct 18, 2016

@polyfractal Thank you very much for your detail explanation and I agree with your consideration of the current moving average implementation. In the future, are you possible to add a shift parameter so users can have options to choose where they want to output the movavg? We do want it align to the current day but not the previous day when we do the statistical control.

Also it's glad to know that you will have the moving standard deviation feature in future. Thank you very much for the great job and we are expecting your new work!

@polyfractal
Copy link
Contributor

I think a shift parameter could be useful, although I'm curious about it's scope. Would it only allow shifting one day forward to align the average, or should we allow arbitrary shifting (e.g. 10 buckets)? From a code perspective, movavg is already rather complicated so we'd like to not introduce too much additional complexity.

Perhaps we could add a dedicated shift pipeline agg which sits "before" the movavg? Not sure, haven't thought it through very carefully :)

@mpaduada
Copy link

+1 on adding a parameter to the Moving Average Aggregation to allow arbitrary shifting. A lagged moving average can be quite useful, such as when dealing with seasonalities.

For what it's worth, there's an ugly workaround using one or two Moving Average Aggregations and a Bucket Script Aggregation. For what @l8liu was trying to do, use a Bucket Script Aggregation to add the current bucket's value to the moving average result. And to calculate a moving average with a window of 5 and a lag of 3, use:

{
  "aggs": {
    ...
      },
      "aggs": {
        "my_metric" :{
          ...
        },
        "ma_window_plus_shift": {
          "moving_avg": {
            "buckets_path": "my_metric",
            "window": 8
          }
        },
        "ma_shift": {
          "moving_avg": {
            "buckets_path": "my_metric",
            "window": 3
          }
        },
        "lagged_ma" : {
          "bucket_script": {
            "buckets_path": {
              "var1" : "ma_window_plus_shift",
              "var2" : "ma_shift"
            },
            "script": "(params.var1 - params.var2*3/8)*8/5"
          }
        }
      }
    }
  }
}

I would rather have a shift parameter than use three pipeline aggregations to get one result.

In any case, I think it's worth adding a note in the documentation indicating the moving average assumes a lag of 1 bucket.

@markharwood
Copy link
Contributor

Revisiting - looks like added docs needed and/or a new shift param.

cc @elastic/es-search-aggs

@polyfractal
Copy link
Contributor

@l8liu Heya! Sorry for the long radio silence, but I have some news. We recently implemented a Moving Function pipeline agg, which is going to replace the MovAvg agg eventually.

I think the behavior you want should now be doable, since we're opening the calculation up to general scripting. The script is given the array of values from the window, so you should be able to pick/choose the values you want to pass into the MovingFunctions.unweightedAvg() function.

We also implemented a stdDev function, so that request should be doable now too.

Do you think this will work for you?

@l8liu
Copy link
Author

l8liu commented Jun 1, 2018 via email

@polyfractal
Copy link
Contributor

Great! I'm going to close this issue, since it looks like MovFn will work :)

@jcornez
Copy link

jcornez commented May 15, 2019

I'm not clear how the new moving_fn solves this. The values array that is passed to the function still only include the prior values, and not the current. So if I want a moving average of 3 to be the current and prior two, then how can this be accomplished?

@jcornez
Copy link

jcornez commented May 15, 2019

Same question was asked and not answered here
https://discuss.elastic.co/t/get-current-bucket-value-in-a-moving-function-agg/174034

@polyfractal
Copy link
Contributor

On the run so I've only thought about this briefly, but I think the idea was that you request a slightly larger window, and then pick out the values you want. So you ask for four, then use the prior three.

But thinking about it, that doesn't help because it actually shifts the values forward. We might need to tweak the mov_fn to allow left/right/center alignment as originally proposed.

@jcornez
Copy link

jcornez commented May 16, 2019

Thanks - so perhaps we can re-open this issue?

@polyfractal
Copy link
Contributor

I've opened #42181 to track, since we'll just want to modify the moving function agg since movavg is deprecated/removed.

Thanks for poking this issue and bringing it back up! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants