New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

rm data leakage in example notebooks #2008

Closed

SimTheGreat wants to merge 3 commits into unit8co:master from SimTheGreat:main

Contributor

SimTheGreat commented Sep 26, 2023

fixes #644

Other Information

fix scaling in example notebook


          rm data leakage in example notebooks

68e9c4d

SimTheGreat requested a review from dennisbader as a code owner

September 26, 2023 13:26

review-notebook-app bot commented Sep 26, 2023

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB


          fix: linting

fefa9d5

codecov-commenter commented Sep 27, 2023

Codecov Report

All modified lines are covered by tests ✅

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

see 6 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

dennisbader requested changes

View reviewed changes

Collaborator

dennisbader left a comment

Thanks a lot for this @SimTheGreat this is a nice cleaning up of the notebooks 🚀

I added some suggestions in the ReviewNB. The main suggestions were:

try to revert changes to files where no code changed -> we rude the git diffs
try to upgrade ipywidgets and rerun all the notebooks with changes (I explain in one of the comments how to do it): this will make the progress bar look better and improve the notebook appearance
we can actually avoid scaling/transforming for many of the covariates as they were one hot encoded.

examples/00-quickstart.ipynb

@@ @@ -2970,7 +2970,7 @@ @@
                  "name": "python",
                  "nbconvert_exporter": "python",
                  "pygments_lexer": "ipython3",
-                 "version": "3.9.16"
+                 "version": "3.10.11"

Collaborator

dennisbader Sep 28, 2023

as discussed, let's revert this change

Suggested change

      
               "version": "3.10.11"
          
               "version": "3.9.16"

review-notebook-app bot commented Sep 28, 2023 •

edited

Loading

View / edit / reply to this conversation on ReviewNB

dennisbader commented on 2023-09-28T08:13:36Z
----------------------------------------------------------------

There is an issue with ipywidgets not showing the progess bar properly (the "Predicting: 0it [00:00, ?it/s]")

I had this as well. We can take this opportunity to improve the progess bar appearance :)

Could you try pip install -U ipywidgets in your environment, close and reopen the jupyter session and run all the example notebooks where you applied changes again? The expected bar should look something like Lightning-AI/pytorch-lightning#1399 (comment)

review-notebook-app bot commented Sep 28, 2023 •

edited

Loading

View / edit / reply to this conversation on ReviewNB

dennisbader commented on 2023-09-28T08:13:37Z
----------------------------------------------------------------

Can you rename air_train/val and milk_train/val in the beginning to air_train/val_covariates and mil_train/val_covariates and then use the transformer/scaler to overwrite them?

Just have a clear distinction between the covariates and the target series (train/val_air,train/val_milk )

dennisbader reviewed

View reviewed changes

examples/04-RNN-examples.ipynb

    
            @@ -4450,7 +4450,7 @@
          
                  },

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

It looks like there were no code changes in this notebook. Can you try to revert the changes to this file? We usually try to keep the git history/diffs as minimal as possible :)

Reply via ReviewNB

dennisbader reviewed

View reviewed changes

examples/05-TCN-examples.ipynb Outdated

    
            @@ -54,19 +54,22 @@
          
                 "source": [

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Could you add the following lines in this cell at the bottom?

We want to avoid some info from PyTorch lightning (mostly the device information that get spammed when calling historical forecasts)

import warnings
warnings.filterwarnings("ignore")

import logging
logging.disable(logging.CRITICAL)

Reply via ReviewNB

examples/05-TCN-examples.ipynb Outdated

    
            @@ -54,19 +54,22 @@
          
                 "source": [

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #16.    scaler_month = Scaler()

actually we can avoid scaling the month series, as it's one hot encoded anyways (encodes months as n months - 1 binary columns).

Can you remove the scaling here?

Reply via ReviewNB

examples/05-TCN-examples.ipynb Outdated

    
            @@ -54,19 +54,22 @@
          
                 "source": [

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

this here will definitely look better after upgrading ipywidgets (my comments in notebook 01)

Reply via ReviewNB

examples/05-TCN-examples.ipynb Outdated

    
            @@ -54,19 +54,22 @@
          
                 "source": [

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #17.    series_en_transformed = scaler_en.fit_transform(series_en)

this line should be somewhere below fitting the transformer on train_en and using transform()

Reply via ReviewNB

examples/05-TCN-examples.ipynb Outdated

    
            @@ -54,19 +54,22 @@
          
                 "source": [

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #26.    scaler_day = Scaler()

here we can avoid scaling again as we use a one hot encoding for the days

Reply via ReviewNB

dennisbader reviewed

View reviewed changes

examples/07-NBEATS-examples.ipynb Outdated

    
            @@ -37,7 +37,8 @@
          
                  "from darts.models import NBEATSModel\n",

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #10.    from darts.timeseries import concatenate

change to

from darts import concatenate

Reply via ReviewNB

dennisbader reviewed

View reviewed changes

examples/08-DeepAR-examples.ipynb Outdated

    
            @@ -11,7 +11,7 @@
          
                },

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

can you rerun this notebook? seems like there were some issues with too short series for validation

Reply via ReviewNB

examples/08-DeepAR-examples.ipynb Outdated

    
            @@ -11,7 +11,7 @@
          
                },

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #27.    train, val = day_series.split_after(pd.Timestamp("20170901"))

we can avoid scaling the day series as it's one-hot encoded

Reply via ReviewNB

examples/08-DeepAR-examples.ipynb Outdated

    
            @@ -11,7 +11,7 @@
          
                },

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #2.        series=day_scaled,

this should be

series=series_en_transformed,

Reply via ReviewNB

dennisbader reviewed

View reviewed changes

examples/09-DeepTCN-examples.ipynb Outdated

    
            @@ -44,7 +44,7 @@
          
                  "from darts import TimeSeries\n",

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #11.    from darts.timeseries import concatenate

from darts import concatenate

Reply via ReviewNB

examples/09-DeepTCN-examples.ipynb Outdated

    
            @@ -44,7 +44,7 @@
          
                  "from darts import TimeSeries\n",

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #23.    scaler_day = Scaler()

avoid scaling as one hot encoded

Reply via ReviewNB

examples/09-DeepTCN-examples.ipynb Outdated

    
            @@ -44,7 +44,7 @@
          
                  "from darts import TimeSeries\n",

Collaborator

dennisbader Sep 28, 2023 •

edited

Loading

Line #2.        series=concatenate([train_en_transformed, val_en_transformed]),

Let's define series_en_transformed again before (where we apply the scaler), and then reuse it here and in the next cell. Like this we can avoid having to compute series_en_transformed twice, which can reduce time for longer series.

Reply via ReviewNB

review-notebook-app bot commented Sep 28, 2023 •

edited

Loading

View / edit / reply to this conversation on ReviewNB

dennisbader commented on 2023-09-28T08:13:47Z
----------------------------------------------------------------

It looks like there weren't any code changes in this file. Can you try to revert the changes?


          making baselinemodels Global

SimTheGreat mentioned this pull request

Doc/data leak #2020

Merged

madtoinou closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet