Skip to content

eobrain/smoothish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smoothish -- Smoothing out time-series data with boundary and missing data handling

The Smoothish JavaScript library provides variations of centered moving average functions that are robust to missing data and don't drop points at the beginning and end boundary.

Installation and import

Installation:

npm install smoothish

Import (classic):

const smoothish = require('smoothish')

or (modern):

import smoothish from 'smoothish'

Basic Usage

Consider the following time series of twelve values:

//                    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dex
const daysPerMonth = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

This makes for a rather jagged graph:

spark

We can apply the smoothish function to smooth out the data:

const smoothed = smoothish(daysPerMonth)
// --> [ 30.1, 29.7, 30.1, 30.3, 30.4, 30.5, 30.6, 30.6, 30.5, 30.6, 30.5, 30.7 ]

spark

Handling missing data

Consider an array of data that has some missing data:

const incompleteDaysPerMonth = [31, 28, undefined, 30, 31, null, 31, 31, null, 31, 30, 31]

spark

The smoothing function bridges over missing points

const smoothedIncomplete = smoothish(incompleteDaysPerMonth)
// --> [ 30.0, 29.4, 29.8, 30.1, 30.5, 30.6, 30.8, 30.8, 30.8, 30.7, 30.6, 30.7 ]

spark

Or here's another example of a linear increasing set of numbers with some missing.

const linear = [undefined, 200, 300, undefined, 500, undefined, 700, 800, 900]

The smoothish function fills in the interior missing data points, though note that it does not extrapolate missing values at the beginning and end.

const smoothedLinear = smoothish(linear)
// --> [ undefined, 200, 300, 400, 500, 600, 700, 800, 900 ]

spark

Changing the radius

All of the above examples use the default radius of 2, which means the smoothing is similar to a 5-point moving average (using a neighborhood that includes the center point and two points on either side).

We can specify a different value of the radius in an in an optional second argument to smoothish.

This is best seen with a step function that abruptly changes value:

const stepFunction = [100, 100, 100, 100, 100, 200, 200, 200, 200, 200]

spark

Setting the radius to 1 produces a some smoothing:

const radius1 = smoothish(stepFunction, { radius: 1 })
// --> [ 98.6, 99.9, 102, 109, 126, 174, 191, 198, 200, 201 ]

spark

Increasing the radius to 2 (the default) increases the smoothing.

const radius2 = smoothish(stepFunction, { radius: 2 })
// --> [ 91.6, 98.1, 106, 118, 136, 164, 182, 194, 202, 208 ]

spark

And increasing the radius to 3 increases the smoothing more.

const radius3 = smoothish(stepFunction, { radius: 3 })
// --> [ 87.5, 97.0, 108, 121, 138, 162, 179, 192, 203, 212 ]

spark

The Least-Squares algorithm

Bu default smoothish uses a a least-squares linear interpolation for each point using the values of neighboring points, and replaces each point with the interpolated point.

Exponential falloff

By default "neighboring points" are all points, but with the ones closer having more weight with an exponential decay in both directions with a time constant of radius.

Using different algorithms and falloffs

As an alternative to the least-squares based smoothing, you can have smoothish do moving-average smoothing by adding a algorithm: 'movingAverage' property to the optional second parameter.

And as an alternative to the exponential falloff you can set falloff: 'step' to include only the points within radius and to have them equally weighted.

So for example to get a standard five-point moving average, you can use the following. (A radius of 2 means that 2 previous, 2 following, and the current point are included, giving a total of five points being averaged for each point.) )

const movingAverage = smoothish(daysPerMonth,
  { algorithm: 'movingAverage', falloff: 'step', radius: 2 })
// --> [ 30.0, 30.0, 30.2, 30.0, 30.6, 30.6, 30.6, 30.6, 30.6, 30.6, 30.5, 30.7 ]

spark

Note that this is a centered (not lagging) moving average.

Also note that it produces as many output points as there are input points. That means that there is special handling of the boundaries. So in the above example 5 points are averaged for each interior point, but only 3 points are averaged at the end points.

If you really want to match a standard moving average exactly you would need to lop off radius points from each end of the result:

const strictMovingAverage = movingAverage.slice(2, -2)
// --> [ 30.2, 30.0, 30.6, 30.6, 30.6, 30.6, 30.6, 30.6 ]

spark

Note for many cases the default algorithm: 'leastSquares' gives better results than algorithm: 'movingAverage'. For example see how the smoothing of the incomplete linear data below is worse than than the straight line produces by the default algorithm above.

const movingAverageLinear = smoothish(linear,
  { algorithm: 'movingAverage', falloff: 'step' })
// --> [ undefined, 250, 333, 333, 500, 667, 725, 800, 800 ]

spark

More Details

See also API docs.

The tests and its snapshots also have examples.

Legal

Copyright (c) 2020 Eamonn O'Brien-Strain All rights reserved. This program and the accompanying materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at http://www.eclipse.org/legal/epl-v10.html

This is a purely personal project, not a project of my employer.