Skip to content

asilvino/carot-timeseries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tree Regression with time series Examples

Tree regression time series examples
These examples are using "caret","data.table" and "bst" libraries.

install.packages("caret")
install.packages("data.table")
install.packages("bst")

##Databases:

  • SP500.csv
  • elec.dat
  • stemp.dat
  • store1.csv
  • store2.csv

All the databases were retrived from the following places:

S&P500 Close value of day (daily)(from January 2, 1990 to December 31, 1999).
SP500 ->

> library(MASS)
> data(SP500)
> plot(SP500, type = 'l')

elec.dat and stemp.dat are from the Paul Cowpertwait's databases, from Introductory Time Series with R
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/#Data

The monthly supply of electricity (millions of kWh), beer (Ml), and chocolate-based production (tonnes) in Australia over the period January 1958 to December 1990 are available from the Australian Bureau of Statistics (ABS).
elec ->
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat

The monthly temperature data (1850–2007; see Brohan et al. 2006) for the southern hemisphere were extracted from the database maintained by the University of East Anglia Climatic Research Unit
stemp ->
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/stemp.dat

The other databases store1 ,store2 are from : https://www.kaggle.com/c/rossmann-store-sales/data
You are provided with historical sales data for 1,115 Rossmann stores, from 01/01/2013 to 31/07/2015. The task is to forecast the "Sales" column for the test set. Note that some stores in the dataset were temporarily closed for refurbishment.
The date is in one column
store1 -> store1.csv
store2 -> store2.csv

#Basic approach

##Preparing the samples
We will use the createTimeSlices tool, and generate the following format of samples:

So you can make a cross-validation based on holdout method or K-fold cross validation , respecting the series cronology.
##Code Sample

variabla<- 'elec'#the-predictable-variable
nLag <- 12
khorizon <- 1
#adding the lags (historical data) as atributes in the base
base <- setDT(base)[, paste0(variable, 1:nLag) := shift(elec, 1:nLag)][]
base <- base[(nLag+1):nrow(base),]
timeSlices <- createTimeSlices(1:nrow(base), 
                   initialWindow =nrow(base)*2/3, horizon = khorizon , fixedWindow = FALSE)
str(timeSlices,max.level = 1)
trainSlices <- timeSlices[[1]]
testSlices <- timeSlices[[2]]
for(i in 1:length(trainSlices)){
  plsFitTime <- train(variable ~  .,
                      data = base[trainSlices[[i]],], 
                      method = "treebag"
                      )
  pred <- predict(plsFitTime,base[testSlices[[i]],])
  true <- base$elec[testSlices[[i]]]
}

... method = "treebag"#here we can change for any other from this list:here .... ##Have FUN!
references:
caret lib
r-blogger time series cross validation

About

Tree regression (CART) with time series examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages