Skip to content
This repository has been archived by the owner on Jan 22, 2021. It is now read-only.

Machine learning may fail when time period is too small #29

Open
delhomer opened this issue Jun 22, 2018 · 0 comments
Open

Machine learning may fail when time period is too small #29

delhomer opened this issue Jun 22, 2018 · 0 comments

Comments

@delhomer
Copy link
Collaborator

Some examples of failing commands:

python -m luigi --module jitenshea.tasks.city Clustering --city lyon --start 2017-09-01 --stop 2017-09-04 --local-scheduler
python -m luigi --module jitenshea.tasks.city Clustering --city bordeaux --start 2017-08-01 --stop 2017-08-04 --local-scheduler

They give following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In such a case, it seems that some stations will unsufficient amount of data are not filtered (see e.g. station n°44 in Bordeaux, in the last example: a value for hour 0, but NaN for every other hours).

Workaround: prevent from choosing too close start and stop dates
Fix: NaN management when computing df_norm

@delhomer delhomer changed the title Machine learning fails when time period is too small Machine learning may fail when time period is too small Jun 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant