Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent warning messages in the midasr package and unrealistic MSE in and out sample values for a regression. #92

Open
JuBausch opened this issue Apr 30, 2024 · 1 comment

Comments

@JuBausch
Copy link

I was trying to forecast the RV with an almon lag function in midasr. The problem I am encountering is quite strange, as after fitting two models (one without an external variable and one with an external variable) the MSE in and out of was exploding for the second variable for some reasons, which does not make sense as adding new variables should increase the model fit and therefore decrease MSE. The HAR estmimation works well on the other hand. Here is the code and the respective MSEs:

ts_id <- "O66D"
data_xts <- data_xts[data_xts$rv5 != 0, ]
data_xts$CS <- impute_na_with_neighbors(data_xts$CS)
data_xts$NTFS <- impute_na_with_neighbors(data_xts$NTFS)
data_xts$VIX <- impute_na_with_neighbors(data_xts$VIX)
# Convert 'rv5' to a time series object
tsx_var <- ts(coredata(log(data_xts$rv5^2)), frequency = 252)
tsx_var_vix <- ts(coredata(log(data_xts$VIX)), frequency = 252)
tsx_var_vixxx <- tsx_var_vix
tsx_var_cs <- ts(coredata(log(data_xts$CS)), frequency = 252)
tsx_var_ntfs <- ts(coredata(data_xts$NTFS), frequency = 252)
# Convert the time series identifier to a time series object
tsy_var <- ts(coredata(log(data_xts[, ts_id]^2)), frequency = 252)


nealmon_model_DJINET_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon),
                                  start = list(tsx_var = c(1, -0.5)), weight_gradients = list())

nealmon_model_vix_DJINET_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon) + mls(tsx_var_vix, 1:22, 1, nealmon),
                                      start = list(tsx_var = c(1, -0.5), tsx_var_vix = c(1, -0.5)), weight_gradients = list())


forecast_DJINET_66d <- average_forecast(list(nealmon_model_DJINET, nealmon_model_vix_DJINET_O66d),
                                     data = list(tsx_var = tsx_var, tsy_var = tsy_var, tsx_var_vix = tsx_var_vix, tsx_var_cs = tsx_var_cs, tsx_var_ntfs = tsx_var_ntfs),
                                     insample = 1:end_sample, outsample = out_sample_start:length(tsx_var),
                                     type = "fixed", show_progress = FALSE)
#after that i got the following Warning messages:
Warning Messages:
1: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
2: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
3: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
4: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
5: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list

forecast_DJINET_66d[["accuracy"]][["individual"]][["MSE.out.of.sample"]]
[1] 0.6400151 5.8904797
forecast_DJINET_66d[["accuracy"]][["individual"]][["MSE.in.sample"]]
[1] 0.805655 6.367892

#recalculating the MSE for the whole sample models with and without the external variable:
> mean(nealmon_model_DJINET_O66d[["residuals"]]^2)
[1] 0.3805153
> mean(nealmon_model_vix_DJINET_O66d[["residuals"]]^2)
[1] 0.3653933

I am not sure how this error can take place, if its due to the warning or a missspecification, but the lower MSE seems impossible to me.
If someone could help, it would be much appreciated!

@vzemlys
Copy link
Member

vzemlys commented Apr 30, 2024

Please post the data, as I cannot reproduce the problem without the data.

The warning comes from here:

warning("Duplicate names in data. Using the one from the list")
.

If you pass as data the object which have columns the column names are ignored and the name is taken from the list. To avoid that, if you want to pass data via named list, and the data is only one column, please pass it as a vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants