Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NegativeBinomial scaling #814

Merged
merged 1 commit into from
May 10, 2020
Merged

Conversation

lostella
Copy link
Contributor

@lostella lostella commented May 8, 2020

Issue #, if available: #718 #719 and may be related to #636

Description of changes: The scaling of alpha in #719 only makes sense if scale >= 1.0, otherwise the scaled alpha can become negative. Intuitively this makes a lot of sense for count data, which can only really be upscaled and not downscaled.

This PR makes sure that that's the case. Essentially, if scale < 1.0 then it is set to 1.0.

As a minimum working example, consider the following snippet:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.distribution import NegativeBinomialOutput
from gluonts.dataset.common import ListDataset
from gluonts.trainer import Trainer

data = np.random.negative_binomial(n=3, p=0.9, size=(200,))

data_series = pd.Series(
    data=data,
    index=pd.date_range(
        start='2014-01-05 00:00:00',
        periods=len(data),
        freq='w'
    )
)

dataset = ListDataset(
    data_iter=[
        {
            "start": '2014-01-05 00:00:00',
            "target": list(data)
        }
    ],
    freq="w"
)

estimator = SimpleFeedForwardEstimator(
    freq="w",
    prediction_length=20,
    distr_output=NegativeBinomialOutput(),
    trainer=Trainer(epochs=10, hybridize=False),
)

predictor = estimator.train(dataset)

forecast = next(iter(predictor.predict(dataset)))
data_series.plot()
forecast.plot()
plt.show()

Before the fix:

Traceback (most recent call last):
  File "/Users/stellalo/gluon-ts/issues/issue_negbin.py", line 39, in <module>
    predictor = estimator.train(dataset)
  File "/Users/stellalo/gluon-ts/src/gluonts/model/estimator.py", line 252, in train
    training_data, validation_data, num_workers, num_prefetch, **kwargs
  File "/Users/stellalo/gluon-ts/src/gluonts/model/estimator.py", line 231, in train_model
    validation_iter=validation_data_loader,
  File "/Users/stellalo/gluon-ts/src/gluonts/trainer/_base.py", line 328, in __call__
    "Got NaN in first epoch. Try reducing initial learning rate."
gluonts.core.exception.GluonTSUserError: Got NaN in first epoch. Try reducing initial learning rate.

After the fix:

issue_negbin

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@lostella lostella changed the title bound scale above 1 Fix NegativeBinomial scaling May 8, 2020
@lostella lostella requested a review from canerturkmen May 8, 2020 11:49
@lostella lostella added this to the v0.5 milestone May 8, 2020
Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions here,

  1. It's whoever's consuming this class that decides the scale. So if that consumer expects that scale = 0.0001, will the behavior not be affected by that?
  2. Why not F.maximum(1, scale) but a soft thresholding?

@lostella
Copy link
Contributor Author

lostella commented May 8, 2020

Two questions here,

1. It's whoever's consuming this class that decides the scale. So if that consumer expects that scale = 0.0001, will the behavior not be affected by that?

I think the contract can reasonably be "you'll have the distribution scaled by the scale that you pass, as long as that's larger or equal to 1", I don't see problems with that.

2. Why not `F.maximum(1, scale)` but a soft thresholding?

I just went for the differentiable option in case the scale is output by some model (with parameters which are optimized). In all of our models the scale always a function of the data only, but who knows. Not that there seem to be a problem with maximum and SGD, but we use softmax everywhere for this purpose.

@lostella
Copy link
Contributor Author

lostella commented May 8, 2020

cc @kashif

@kashif
Copy link
Contributor

kashif commented May 8, 2020

@lostella looks good... I forgot about the scale < 1.0 case because I always assumed it to be > 1, but yes your solution is elegant. I remember seeing something similar done in another context but I don't remember where right now.... (perhaps in some RL setting...). 👍

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, and seems to be producing consistent results in practice. Thanks a lot for not letting this go 🎩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants