The “forecastHybrid” package
provides functions to build composite models using multiple individual
component models from the “forecast” package. These
hybridModel
objects can then be manipulated with many of
the familiar functions from the “forecast” and “stats” packages
including forecast()
, plot()
,
accuracy()
, residuals()
, and
fitted()
.
The stable release of the package is hosted on CRAN and can be installed as usual.
The latest development version can be installed using the “devtools” package.
Version updates to CRAN will be published frequently after new features are implemented, so the development version is not recommended unless you plan to modify the code.
First load the package.
If you don’t have time to read the whole guide and want to get
started immediately with sane default settings to forecast the
USAccDeaths
timeseries, run the following:
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the thetam model
## Fitting the nnetar model
## Fitting the stlm model
## Fitting the tbats model
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1979 8346.986 7924.712 8985.630 7663.630 9266.804
## Feb 1979 7518.762 6756.730 8035.616 6469.317 8283.346
## Mar 1979 8241.355 7124.655 8886.679 6864.818 9146.115
## Apr 1979 8531.794 7599.810 9194.629 7309.631 9500.477
## May 1979 9336.994 8127.590 10112.349 7866.533 10442.376
## Jun 1979 9764.892 8532.078 10525.745 8252.769 10878.297
## Jul 1979 10677.924 9276.466 11613.448 8912.748 11987.171
## Aug 1979 9997.848 9083.349 10830.331 8815.849 11224.086
## Sep 1979 9006.666 8281.314 9944.791 7892.801 10357.609
## Oct 1979 9244.211 8282.017 10198.510 8013.813 10629.548
## Nov 1979 8774.131 7997.535 9732.246 7589.183 10180.765
## Dec 1979 9100.799 8255.912 10255.628 7816.759 10720.971
## Jan 1980 8371.764 7485.300 9496.297 7033.420 10011.749
## Feb 1980 7576.642 6687.306 8749.398 6141.503 9295.201
## Mar 1980 8259.146 7031.541 9586.522 6725.346 10161.075
## Apr 1980 8545.992 7370.189 9940.466 7036.836 10542.396
## May 1980 9334.686 7776.882 10861.976 7465.210 11490.092
## Jun 1980 9751.822 8110.334 11280.306 7793.230 11933.559
## Jul 1980 10640.004 8598.076 12373.743 8224.312 13051.201
## Aug 1980 9958.947 8663.516 11596.927 8248.308 12297.755
## Sep 1980 8990.710 7949.455 10718.099 7261.423 11441.542
## Oct 1980 9234.310 8020.405 10978.824 7417.369 11724.196
## Nov 1980 8775.819 7623.215 10519.779 6856.540 11286.453
## Dec 1980 9092.478 7941.222 11050.531 7288.262 11837.932
The workhorse function of the package is hybridModel()
,
a function that combines several component models from the “forecast”
package. At a minimum, the user must supply a ts
or
numeric
vector for y
. In this case, the
ensemble will include all six component models:
auto.arima()
, ets()
, thetam()
,
nnetar()
, stlm()
, and tbats()
. To
instead use only a subset of these models, pass a character string to
the models
argument with the first letter of each model to
include. For example, to build an ensemble model on a simulated dataset
with auto.arima()
, ets()
, and
tbats()
components, run
# Build a hybrid forecast on a simulated dataset using auto.arima, ets, and tbats models.
# Each model is given equal weight
set.seed(12345)
series <- ts(rnorm(18), f = 2)
hm1 <- hybridModel(y = series, models = "aet", weights = "equal")
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the tbats model
The individual component models are stored inside the
hybridModel
objects and can viewed in their respective
slots, and all the regular methods from the “forecast” package could be
applied to these individual component models.
## Series: y
## ARIMA(0,0,0) with zero mean
##
## sigma^2 = 0.6659: log likelihood = -21.88
## AIC=45.76 AICc=46.01 BIC=46.65
The hybridModel()
function produces an S3 object of
class forecastHybrid
.
## [1] "hybridModel"
## [1] TRUE
The print()
and summary()
methods print
information about the ensemble model including the weights assigned to
each individual component model.
## Hybrid forecast model comprised of the following models: auto.arima, ets, tbats
## ############
## auto.arima with weight 0.333
## ############
## ets with weight 0.333
## ############
## tbats with weight 0.333
## Length Class Mode
## auto.arima 18 forecast_ARIMA list
## ets 19 ets list
## tbats 21 bats list
## weights 3 -none- numeric
## frequency 1 -none- numeric
## x 18 ts numeric
## xreg 1 -none- list
## models 3 -none- character
## fitted 18 -none- numeric
## residuals 18 ts numeric
Two types of plots can be created for the created ensemble model:
either a plot showing the actual and fitted value of each component
model on the data or individual plots of the component models as created
by their regular S3 plot()
methods. Note that a
plot()
method does not exist in the “forecast” package for
objects generated with stlm()
, so this component model will
be ignored when type = "models"
, but the other component
models will be plotted regardless.
Since version 0.4.0, ggplot
graphs are available. Note,
however, that the nnetar
, and tbats
models do
not have ggplot::autoplot()
methods, so these are not
plotted.
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
By default each component model is given equal weight in the final
ensemble. Empirically this has been shown to give good performance in
ensembles [see @Armstrong2001], but
alternative combination methods are available: the inverse root mean
square error (RMSE
), inverse mean absolute error
(MAE
), and inverse mean absolute scaled error
(MASE
). To apply one of these weighting schemes of the
component models, pass this value to the errorMethod
argument and pass either "insample.errors"
or
"cv.errors"
to the weights
argument.
## Warning in hybridModel(series, weights = "insample.errors", errorMethod =
## "MASE", : Using insample.error weights is not recommended for accuracy and may
## be deprecated in the future.
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the nnetar model
## Fitting the stlm model
## Fitting the tbats model
## Hybrid forecast model comprised of the following models: auto.arima, ets, nnetar, stlm, tbats
## ############
## auto.arima with weight 0.163
## ############
## ets with weight 0.163
## ############
## nnetar with weight 0.304
## ############
## stlm with weight 0.177
## ############
## tbats with weight 0.193
After the model is fit, these weights are stored in the
weights
attribute of the model. The user can view and
manipulated these weights after the fit is complete. Note that the
hybridModel()
function automatically scales weights to sum
to one, so a user should similar scale the weights to ensure the
forecasts remain unbiased. Furthermore, the vector that replaces
weights
must retain names specifying the component model it
corresponds to since weights are not assigned by position but rather by
component name. Similarly, individual components may also be
replaced
## auto.arima ets nnetar stlm tbats
## 0.1629947 0.1628211 0.3044131 0.1770904 0.1926807
newWeights <- c(0.1, 0.2, 0.3, 0.1, 0.3)
names(newWeights) <- c("auto.arima", "ets", "nnetar", "stlm", "tbats")
hm2$weights <- newWeights
hm2
## Hybrid forecast model comprised of the following models: auto.arima, ets, nnetar, stlm, tbats
## ############
## auto.arima with weight 0.1
## ############
## ets with weight 0.2
## ############
## nnetar with weight 0.3
## ############
## stlm with weight 0.1
## ############
## tbats with weight 0.3
## Hybrid forecast model comprised of the following models: auto.arima, ets, nnetar, stlm, tbats
## ############
## auto.arima with weight 0.2
## ############
## ets with weight 0.1
## ############
## nnetar with weight 0.3
## ############
## stlm with weight 0.1
## ############
## tbats with weight 0.3
This hybridModel
S3 object can be manipulated with the
same familiar interface from the “forecast” package, including S3
generic functions such as accuracy
, forecast
,
fitted
, and residuals
.
## [1] 0.012621269 0.010804126 0.008561907 0.008984650 0.010541407 0.008636514
## Time Series:
## Start = c(1, 1)
## End = c(3, 2)
## Frequency = 2
## [1] 0.5729075 0.6986619 -0.1178652 -0.4624818 0.5953460 -1.8265925
In-sample errors and various accuracy measure can be extracted with
the accuracy
method. The “forecastHybrid” package creates
an S3 generic from the accuracy
method in the “forecast”
package, so accuracy
will continue to function as normal
with objects from the “forecast” package, but now special functionality
is created for hybridModel
objects. To view the in-sample
accuracy for the entire ensemble, a simple call can be made.
## ME RMSE MAE MPE MAPE ACF1
## Test set -0.006167957 0.8146777 0.6675259 101.5092 101.5092 -0.2202369
## Theil's U
## Test set 0.9281098
In addition to retrieving the ensemble’s accuracy, the
individual component models’ accuracies can be easily viewed by using
the individual = TRUE
argument.
## $auto.arima
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.006161102 0.8159991 0.6667319 100 100 0.7381739 -0.2202409
##
## $ets
## ME RMSE MAE MPE MAPE MASE
## Training set -6.878921e-05 0.8160167 0.6674428 100.7412 100.7412 0.7389609
## ACF1
## Training set -0.2202415
##
## $tbats
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02459618 0.812366 0.6684029 103.7863 103.7863 0.6244447
## ACF1
## Training set -0.2201724
Now’s let’s forecast future values. The forecast()
function produce an S3 class forecast
object for the next
48 periods from the ensemble model.
Now plot the forecast for the next 48 periods. The prediction intervals are preserved from the individual component models and currently use the most extreme value from an individual model, producing a conservative estimate for the ensemble’s performance.
The package aims to make fitting ensembles easy and quick, but it
still allows advanced tuning of all the parameters available in the
“forecast” package. This is possible through usage of the
a.args
, e.args
, n.args
,
s.args
, and t.args
lists. These optional list
arguments may be applied to one, none, all, or any combination of the
included individual component models. Consult the documentation in the
“forecast” package for acceptable arguments to pass in the
auto.arima
, ets
, nnetar
,
stlm
, and tbats
functions.
hm3 <- hybridModel(y = series, models = "aefnst",
a.args = list(max.p = 12, max.q = 12, approximation = FALSE),
n.args = list(repeats = 50),
s.args = list(robust = TRUE),
t.args = list(use.arma.errors = FALSE))
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the thetam model
## Fitting the nnetar model
## Fitting the stlm model
## Fitting the tbats model
Since the lambda
argument is shared between most of the
models in the “forecast” framework, it is included as a special
parameter that can be used to set the Box-Cox transform in all models
instead of settings this individually. For example,
## Fitting the auto.arima model
## Fitting the ets model
## [1] 0.15
## attr(,"biasadj")
## [1] FALSE
## [1] 0.15
## attr(,"biasadj")
## [1] FALSE
Users can still apply the lambda
argument through the
tuning lists, but in this case the list-supplied argument overwrites the
default used across all models. Compare the following two results.
hm5 <- hybridModel(y = USAccDeaths, models = "aens", lambda = 0.2,
a.args = list(lambda = 0.5),
n.args = list(lambda = 0.6))
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the nnetar model
## Fitting the stlm model
## [1] 0.5
## attr(,"biasadj")
## [1] FALSE
## [1] 0.2
## attr(,"biasadj")
## [1] FALSE
## [1] 0.6
## [1] 0.2
## attr(,"biasadj")
## [1] FALSE
Note that lambda has no impact on thetam
models, and
that there is no f.args
argument to provide parameters to
thetam
. Following forecast::thetaf
on which
thetam
is based, there are no such arguments; it always
runs with the defaults.
Covariates can also be supplied to auto.arima
and
nnetar
models as is done in the “forecast” package. To do
this, utilize the a.args
and n.args
lists.
Note that the xreg
may also be passed to a
stlm
model, but only when method = "arima"
instead of the default method = "ets"
. Unlike the usage in
the “forecast” package, the xreg
argument should be passed
as a matrix, not a dataframe. The stlm
models require that
the input series will be seasonal, so in the example below we will
convert the input data to a ts
object. If a
xreg
is used in training, it must also be supplied to the
forecast()
function in the xreg
argument. Note
that if the number of rows in the xreg
to be used for the
forecast does not match the supplied h
forecast horizon,
the function will overwrite h
with the number of rows in
xreg
and issue a warning.
# Use the beaver1 dataset with the variable "activ" as a covariate and "temp" as the time series
# Divide this into a train and test set
trainSet <- beaver1[1:100, ]
testSet <- beaver1[101:110, ]
trainXreg <- matrix(trainSet$activ)
testXreg <- matrix(testSet$activ)
# Create the model
beaverhm <- hybridModel(ts(trainSet$temp, f = 6),
models = "aenst",
a.args = list(xreg = trainXreg),
n.args = list(xreg = trainXreg),
s.args = list(xreg = trainXreg, method = "arima"))
## Fitting the auto.arima model
## Fitting the ets model
## Fitting the nnetar model
## Fitting the stlm model
## Fitting the tbats model
# Forecast future values
beaverfc <- forecast(beaverhm, xreg = testXreg, PI=FALSE)
# View the accuracy of the model
accuracy(beaverfc, testSet$temp)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.000840294 0.08291995 0.05567882 0.001888822 0.1508168 0.8441352
## Test set 0.037342919 0.06815497 0.05301275 0.101165944 0.1438446 0.8037155
## ACF1
## Training set -0.008447756
## Test set NA
It can be useful to perform cross validation on a forecasting model
to estimate a model’s out-of-sample forecasting performance. The
cvts()
function allows us to do this on arbitrary
functions. We could do this as part of a model selection procedure to
determine which models to include in our call to
hybridModel()
or merely to understand how well we expect to
forecast the series during unobserved windows.
For example, let’s perform cross validation for a stlm()
model and a naive()
model on the woolyrnq
time
series. The most important cvts()
arguments that commonly
need adjusting are rolling
(if TRUE
, the model
will always be fit on a fixed windowSize
instead of growing
by one new observation for each new model fit during cross validation),
windowSize
(starting length of time series to fit a model),
and maxHorizon
(the forecast horizon for predictions from
each model). Since a naive forecast is a good baseline that any decent
model should surpass, let’s see how the stlm()
model
compares.
stlmMod <- cvts(woolyrnq, FUN = stlm, windowSize = 100, maxHorizon = 8)
naiveMod <- cvts(woolyrnq, FUN = naive, windowSize = 100, maxHorizon = 8)
accuracy(stlmMod)
## ME RMSE MAE
## Forecast Horizon 1 165.26228 297.73576 247.65896
## Forecast Horizon 2 -171.83157 306.92842 254.32059
## Forecast Horizon 3 -247.26738 261.85660 247.26738
## Forecast Horizon 4 -84.66056 122.64993 88.74455
## Forecast Horizon 5 -106.73772 108.18863 106.73772
## Forecast Horizon 6 -53.33157 259.36297 253.82059
## Forecast Horizon 7 -22.26738 49.14934 43.81577
## Forecast Horizon 8 -64.66056 144.06994 128.74455
## ME RMSE MAE
## Forecast Horizon 1 -768.5 802.0365 768.5
## Forecast Horizon 2 -229.0 324.5628 230.0
## Forecast Horizon 3 142.5 184.0611 142.5
## Forecast Horizon 4 -88.0 124.4508 88.0
## Forecast Horizon 5 -1040.5 1040.5001 1040.5
## Forecast Horizon 6 -110.5 254.7165 229.5
## Forecast Horizon 7 367.5 367.7479 367.5
## Forecast Horizon 8 -68.0 144.9414 128.0
We see from looking at the accuracy measure–in particular the smaller
RMSE and MAE–the stlm()
model unsurprisingly performs
better and will likely give us better future forecasts. We also notice
that the apparent edge over the naive forecast tends to diminish or even
disappear for longer forecast horizons, and a look at the original time
series makes this result obvious: this time series lacks an obvious
trend and is a relatively difficult time series to forecast past a fewer
seasonal periods, so the naive model will not perform relatively
poorly.
We can also use custom functions, for example fcast()
from the “GMDH” package. We must be very careful that our custom
forecast function still produces an expected “forecast” S3 class object
and that the ts object start, end, and frequency properties are
preserved.
library(GMDH)
GMDHForecast <- function(x, h){
fc <- GMDH::fcast(x, f.number = h)
# GMDH doesn't produce a ts object with correct attributes, so we build it
end <- tsp(x)[2]
freq <- frequency(x)
# Set the correct start, end, and frequency for the ts forecast object
tsProperties <- c(end + 1 / freq, end + h / freq, freq)
tsp(fc$mean) <- tsProperties
tsp(fc$upper) <- tsProperties
tsp(fc$lower) <- tsProperties
class(fc) <- "forecast"
return(fc)
}
series <- subset(woolyrnq, end = 12)
gmdhcv <- cvts(series, FCFUN = GMDHForecast, windowSize = 10, maxHorizon = 1)
As a final example, suppose we foolish want to implement our own
version of naive()
for performing cross validation. The
FUN
and FCFUN
could then look like
customMod <- function(x){
result <- list()
result$series <- x
result$last <- tail(x, n = 1)
class(result) <- "customMod"
return(result)
}
forecast.customMod <- function(x, h = 12){
result <- list()
result$model <- x
result$mean <- rep(x$last, h)
class(result) <- "forecast"
return(result)
}
series <- subset(AirPassengers, end = 94)
cvobj <- cvts(series, FUN = customMod, FCFUN = forecast.customMod)
Previously we explored fitting hybridModel()
objects
with weights = "equal"
or
weights = "insample.errors
, but we can now leverage the
process conducted in cvts()
to select the appropriate
weights intelligently based on the expected out-of-sample forecast
accuracy of each component model. While this is the
methodologically-sound weight procedure, it also comes at significant
computational cost since the cross validation procedure necessitates
fitting each model several times for each cross validation fold in
addition to the final fit on the whole dataset. Fortunately this process
can be conducted in parallel if multiple cores are available. Some of
the arguments explained above in cvts()
such as
windowSize
and the cvHorizon
can also be
controlled here.
cvMod <- hybridModel(woolyrnq, models = "ns",
weights = "cv.errors", windowSize = 100,
cvHorizon = 8, num.cores = 4)
cvMod