Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NLinear normalization to support past covariates #1873

Conversation

Eliotdoesprogramming
Copy link
Contributor

@Eliotdoesprogramming Eliotdoesprogramming commented Jun 30, 2023

Re-opening #1583
(sorry made a bit of a mistake when working on that branch, decided a fresh fork & pr might be best)

Normalization description from original paper

NLinear: To boost the performance of Linear when there is a distribution shift in the dataset, NLinear first subtracts the input by the last value of the sequence. Then, the input goes through a linear layer, and the subtracted part is added back before making the final prediction. The subtraction and addition in NLinear are a simple normalization for the input sequence.

Summary

current implementation of normalization follows the implementation here

this implementation works when the number of covariates being predicted as our target variable is the same as the number of covariates in our input (prev comment in implementation is incorrect, will work when n_params > 1 if n_params = target covariates)

since self.n_params == the amount of covariates we are predicting for AND we know that they are ordered first in our tensor
input_tensor = [batch,timesteps, input_dim/number of covariates] we can slice the tensor to only include the covariates in our target tensor like so last_seq[:,:,output_dim:]

Other Information

New to doing open source work, please let me know if theres more that I need to do!! This was something I found when working on one of my own projects.

@Eliotdoesprogramming
Copy link
Contributor Author

Eliotdoesprogramming commented Jun 30, 2023

I added a test case that demonstrates why the change to seq_last is necessary. However I've been running into some confusion when it comes the InferenceDataset preparation in predict in _eval_model()

# test_dlinear_nlinear.py line 277
            e1, e2 = _eval_model(
                train1,
                train2,
                val1,
                val2,
                None,
                None,
                past_cov1=past_cov1,
                past_cov2=past_cov2,
                val_past_cov1=val_past_cov1,
                val_past_cov2=val_past_cov2,
                cls=NLinearModel,
                lkl=None,
                normalize=True
            )

in

# inference_dataset.py line 66:
        if main_covariate_type is CovariateType.PAST:
            future_end = past_end + max(0, n - output_chunk_length) * target_series.freq

this line of code causes .predict() to fail when using past covariates. .fit() runs great!

It seems to require past_covariates to extend into the future. Why is that? if a maintainer could help me understand I can finish the PR!

@dennisbader @felixdivo

@madtoinou
Copy link
Collaborator

It seems to require past_covariates to extend into the future. Why is that? if a maintainer could help me understand I can finish the PR!

If the n argument of predict() is greater than the output_chunk_length, the model will perform auto-regression (consume its own predictions in the target ts). However, for the 2nd "prediction round", it expect these "future" values of past covariates and will complain if there are not provided. To avoid this situation, you need to make sure that n<= output_chunk_length or that the past_covariates actually extend a bit in the future.

@Eliotdoesprogramming
Copy link
Contributor Author

Eliotdoesprogramming commented Jul 4, 2023

should be ready for review! thanks @madtoinou for the explanation, that makes sense.

made the same minor change from original PR, but added test case coverage to demonstrate why its necessary :) thank you for the patience.

Had some issues with locally running tests through gradle (originating from lightning, support for float64 on macbook platform. Didn't think it was worth changing all the tests to include explicit casting to float32). Linting should be good

Will make changes / fix if the tests are failing on pipeline once it runs.

@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2023

Codecov Report

Patch coverage is 33.33% of modified lines.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Files Changed Coverage
darts/models/forecasting/nlinear.py 33.33%

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@felixdivo felixdivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@felixdivo
Copy link
Contributor

Side note: This PR also extends the tests to cover DLinear and ensure it works too! 🚀

@felixdivo
Copy link
Contributor

@madtoinou can this be merged?

@dennisbader
Copy link
Collaborator

Hey @Eliotdoesprogramming and @felixdivo, and thanks for this. 🚀 The changes look good to me.

I agree with the normalization of the target series to account for the distribution shift.
But I'm a bit worried about also enforcing normalization of the past covariates at the same time (and/or the historic part of future covariates in the input chunk).
If I think about covariates such as datetime attributes (e.g. value of the month of the year) then normalization could result in constant x values. These covariates can help a lot e.g. for capturing seasonalities (and trends, specific events, ...)

Let's say we have an input_chunk_length of 3, and by chance all our batch samples start between March (month=3) and December (month=12). The input chunk for all past covariates batch samples in x will have values [-2, -1, 0] -> we don't get any useful information out of this. Only For January and February we get different values ([10, 11, 0] and [10, -1, 0]).

Also, we normalize the historic part of the future covariates (the values in the input chunk), but leave the values in the output chunk untouched. In my opinion they should be normalized as well using last value of the input chunk sequence.

What's your take on these points? Should give users the choice to enable target/covariates normalization separately? And also normalize the future covariates on the output chunk?

@dennisbader dennisbader mentioned this pull request Jul 19, 2023
@Eliotdoesprogramming
Copy link
Contributor Author

Eliotdoesprogramming commented Jul 19, 2023

@dennisbader I agree its probably best to allow options

I wanted to note that the changes here specifically affect when shared_weights is false. When shared weights is True on line 121 - 123, the behavior is:

if self.normalize:
            # discard covariates, to ensure that in_dim == out_dim
            x = x[:, :, : self.output_dim]
            x = x.permute(0, 2, 1)  # (batch, out_dim, in_len)

so covariates are similarly discarded from the output tensor.

I think at least for my use cases, having the behavior be consistent for all inputs should be default. Current behavior will error with the following use case rather than being able to train:

  import darts
  from darts.datasets import ETTh1Dataset
  from darts.models.forecasting.nlinear import NLinearModel
  series = ETTh1Dataset().load()
  series = series.astype('float32')
  target = series['HUFL']
  past_cov = series['MULL']
  model = NLinearModel(10, 1, shared_weights=False, normalize=True)
  model.fit(series=target, past_covariates=past_cov)

@felixdivo
Copy link
Contributor

Current behavior will error with the following use case rather than being able to train

I was also very surprised to find that you could not train with covariates and would suggest to enable it by default.

@felixdivo
Copy link
Contributor

@dennisbader

I worked through your comment and see your concerns. I would like to comment on these to move this conversation forward. It is a very stale discussion, given that it should be a simple issue to solve.

  1. Regarding the limited usefulness of the normalization if we are dealing with, for example, dates. I agree that in your example, normalization is not that useful. However, just because the model is not perfect for some input data doesn't mean it should not be supported. For other types of data, where the absolute date is irrelevant, eliminating some drift over time can be very useful. And that is why it was proposed in the original publication.

  2. About the historic part of the future covariates. I'm unsure whether I understood your point, but generally, the model returns the target predictions only, right? I looked here. So it does not really matter what we do to the covariates.

@dennisbader
Copy link
Collaborator

@felixdivo and @Eliotdoesprogramming , this is how I interpret it at the moment (feel free to correct me). My concern was regarding these two lines: line 141 and line 151.

# x has shape (batch, input_chunk_length, n targets + n past covs + n future covs)
seq_last = x[:, -1:, :].detach()  # (batch, 1, in_dim)

x here is actually the entire input chunk (past/lookback window of the model). This includes the past target, past covariates and historic part of the future covariates. So when we take x = x - seq_last we are actually also performing normalization on past and future covariates.

So for my concern regarding future covariates: Here we normalize only the historic part of the future covariates (time steps in the input chunk) and leave the future part of future covariates (time steps in the output chunk) unchanged.

Now when inverse transforming, the shape of x changed to (batch, output_chunk_length, n targets * n likelhood parameters). When we apply x = x + seq_last[:, :, : x.shape[-1]], we are not inverse transforming with only target values, but also past and historic future covariates.

Proposed solution

If we only want to normalize only the target, then we need to change this.

For the seq_last from here, x has shape (batch, input_chunk_length, n targets + n past covs + n future covs). To get only the last values of the target features we would have to change it to something like below:

# get last values only for target features
# x has shape (batch, input_chunk_length, n targets + n past covs + n future covs)
seq_last = x[:, -1:, :self.output_dim].detach()

Then for the inverse transformation from here, x has a different shape: (batch, output_chunk_length, n targets * n likelhood parameters)

We should only add seq_last to the matching features. I propose that we do this at the very end after having changed the view of x here.

x = x.view(batch, self.output_chunk_length, self.output_dim, self.nr_params)
if self.normalize:
    x = x + seq_last.view(seq_last.shape + (1,))

Let me know what you think.

@felixdivo
Copy link
Contributor

Thank you, @dennisbader, for elaborating. This helped a lot.

I think there are also use cases where we want to normalize the covariates (past, historical future, future) that go into the model. This is, for example, the case when we have multiple similar senor readings and want to use others to forecast the one we are interested in. Then, we would have similar dynamics in all the sensors and would, therefore, like to forecast the past covariates too.

So we should probably replace the flag normalize: bool = True with normalize_targets: bool = True & normalize_covariates: bool = True, right? However, this would add the new possibility of only normalizing the covariates. Do we want that?

If only the target is to be normalized, your solution is probably the way to go. If both are normalized, the original PR would be fine. Any comments on this?

@Eliotdoesprogramming
Copy link
Contributor Author

sorry for the slow reply @dennisbader those changes sound good to me and I have implemented them. Thanks for all the help on the PR!

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Eliotdoesprogramming. Can you remove the previous inverse transformation, as now we perform it twice?

darts/models/forecasting/nlinear.py Outdated Show resolved Hide resolved
@dennisbader
Copy link
Collaborator

dennisbader commented Sep 6, 2023

@felixdivo, even if we want to normalize both target and covariates, the initial PR would not work properly when using a likelihood with nr_params > 1.

Let's say we use output_chunk_length=1, 1 target component/column, 1 past covariates component, and we use a gaussian likelihood with 2 params (mean, std).

At the inverse transformation step:

# x has shape (batch, output_chunk_length, out_dim * nr_params) = (batch, 1, 1 * 2) = (batch, 1, 2)
# seq_last has shape (batch, 1, 1 target comp + 1 past cov comp) = (batch, 1, 2)
x = x + seq_last[:, :, : x.shape[-1]]

So seq_last here in the last dimension are the last values of the target and the past covariates from the input chunk.

We do not want to add the past covariates value to x at the inverse transformation.

@Eliotdoesprogramming
Copy link
Contributor Author

the duplicated normalization step has now been removed 👍

Copy link
Contributor Author

@Eliotdoesprogramming Eliotdoesprogramming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra normalization step removed!

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing @Eliotdoesprogramming, looks great now 🚀

Also in the new release, users can try out another normalizaton technique: The Reversible Instance Normalization can be used with any torch model (except RNNModel) with use_reversible_instance_norm=True at model creation.

@dennisbader dennisbader merged commit 74ed2bb into unit8co:master Sep 7, 2023
@felixdivo
Copy link
Contributor

I think something went wrong here. I opened a new issue: #2035.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants