You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In conventional DL training interfaces, the data loader is typically a generator object, where iterating over it returns batches of data to pass to the model.
We could make the TaskLoader a generator to adhere to this convention. However, my main issue with this is that there is an enormous amount of flexibility in the TaskLoader.__call__ method. This reflects the flexibility of NPs as probabilistic models that can take any data as context and any data as target, resulting in a variety of ways you might want to sample your raw data to generate Tasks for training. This then begs the question of how next(task_loader) should sample the xarray/pandas dataset objects to produce the context and target data for the Tasks, if the user has not specified this explicitly. What date should be sliced and what sampling strategy should be used for the context/target data?
One option would be to set TaskLoader attributes like a list of train_dates that will be looped over for generating Tasks, plus additional information on the context_sampling and target_sampling strategies. Or, context_sampling and target_sampling and additional TaskLoader.__call__ kwargs could be passed at generation time.
IMO it is safer to have the user explicitly passing and controlling these sampling options by directly calling the TaskLoader.__call__ method to generate batches of Task objects for training. However, if there is a clear benefit for being able to loop over a TaskLoader and a clean way to implement it, then this is worth considering. I'm open to discussion on this.
In conventional DL training interfaces, the data loader is typically a generator object, where iterating over it returns batches of data to pass to the model.
We could make the
TaskLoader
a generator to adhere to this convention. However, my main issue with this is that there is an enormous amount of flexibility in theTaskLoader.__call__
method. This reflects the flexibility of NPs as probabilistic models that can take any data as context and any data as target, resulting in a variety of ways you might want to sample your raw data to generateTask
s for training. This then begs the question of hownext(task_loader)
should sample the xarray/pandas dataset objects to produce the context and target data for theTask
s, if the user has not specified this explicitly. What date should be sliced and what sampling strategy should be used for the context/target data?One option would be to set
TaskLoader
attributes like a list oftrain_dates
that will be looped over for generatingTask
s, plus additional information on thecontext_sampling
andtarget_sampling
strategies. Or,context_sampling
andtarget_sampling
and additionalTaskLoader.__call__
kwargs could be passed at generation time.IMO it is safer to have the user explicitly passing and controlling these sampling options by directly calling the
TaskLoader.__call__
method to generate batches ofTask
objects for training. However, if there is a clear benefit for being able to loop over aTaskLoader
and a clean way to implement it, then this is worth considering. I'm open to discussion on this.cc @jonas-scholz123
The text was updated successfully, but these errors were encountered: