Tree regression time series examples
These examples are using "caret","data.table" and "bst" libraries.
install.packages("caret")
install.packages("data.table")
install.packages("bst")
##Databases:
- SP500.csv
- elec.dat
- stemp.dat
- store1.csv
- store2.csv
All the databases were retrived from the following places:
S&P500 Close value of day (daily)(from January 2, 1990 to December 31, 1999).
SP500 ->
> library(MASS)
> data(SP500)
> plot(SP500, type = 'l')
elec.dat and stemp.dat are from the Paul Cowpertwait's databases, from Introductory Time Series with R
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/#Data
The monthly supply of electricity (millions of kWh), beer (Ml),
and chocolate-based production (tonnes) in Australia over the period January
1958 to December 1990 are available from the Australian Bureau of Statistics
(ABS).
elec ->
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat
The monthly temperature data (1850–2007; see Brohan et al. 2006) for the southern
hemisphere were extracted from the database maintained by the University
of East Anglia Climatic Research Unit
stemp ->
http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/stemp.dat
The other databases store1 ,store2 are from :
https://www.kaggle.com/c/rossmann-store-sales/data
You are provided with historical sales data for 1,115 Rossmann stores, from 01/01/2013 to 31/07/2015. The task is to forecast the "Sales" column for the test set. Note that some stores in the dataset were temporarily closed for refurbishment.
The date is in one column
store1 -> store1.csv
store2 -> store2.csv
#Basic approach
##Preparing the samples
We will use the createTimeSlices
tool, and generate the following format of samples:
So you can make a cross-validation based on holdout method or K-fold cross validation , respecting the series cronology.
##Code Sample
variabla<- 'elec'#the-predictable-variable
nLag <- 12
khorizon <- 1
#adding the lags (historical data) as atributes in the base
base <- setDT(base)[, paste0(variable, 1:nLag) := shift(elec, 1:nLag)][]
base <- base[(nLag+1):nrow(base),]
timeSlices <- createTimeSlices(1:nrow(base),
initialWindow =nrow(base)*2/3, horizon = khorizon , fixedWindow = FALSE)
str(timeSlices,max.level = 1)
trainSlices <- timeSlices[[1]]
testSlices <- timeSlices[[2]]
for(i in 1:length(trainSlices)){
plsFitTime <- train(variable ~ .,
data = base[trainSlices[[i]],],
method = "treebag"
)
pred <- predict(plsFitTime,base[testSlices[[i]],])
true <- base$elec[testSlices[[i]]]
}
...
method = "treebag"#here we can change for any other from this list:
here
....
##Have FUN!
references:
caret lib
r-blogger time series cross validation