Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGGG: lower upper bound limit of k slice sampling #50

Open
gBokiau opened this issue Nov 18, 2016 · 1 comment
Open

PGGG: lower upper bound limit of k slice sampling #50

gBokiau opened this issue Nov 18, 2016 · 1 comment

Comments

@gBokiau
Copy link

gBokiau commented Nov 18, 2016

When drawing pggg parameters I tend to get a limited (0.5 %) cluster of cases with mean k's between 985 and 999, with absolutely no mean k between 50 and 985 (Expected aggregate k is around 0.8).

The only thing seemingly setting these cases apart is that they're somewhat regular but rather short-lived, they're probably somewhat overrepresented and must be confusing the algorithm.

The upper bound for k slice sampling is set at 1000, which at first sight I don't think is a realistic expectation in any scenario? I would suspect a limit of around 100 to be safer and would adjust the algorithm accordingly.

Speaking of assumptions, it occurred to me that k's aggregate distribution is more likely to follow a lognormal than a gamma distribution. Even in the clumpiest of scenarios, extremely low k's remain less likely than values around 0.5, with a few higher k cases always remaining quite likely, a situation the gamma distribution doesn't allow for.

@mplatzer
Copy link
Owner

Interesting! I haven't seen such cases myself. Could you share the plotted
timing patterns of these customers? And what are your estimated t and gamma
parameters?

As I haven't run into similar problems, I also haven't had the need to
lower the upper limit for k. I doubt that it would help though, since the
gamma should cap such outliers anyways. If there is a way for you to share
a dataset which reproduces the behavior, it would be very helpful.

And yes, lognormal might also be a good candidate for the heterogeneity,
also for lambda and mu. Abe's model is using the lognormal for example.

Am 18.11.2016 16:00 schrieb "gBokiau" notifications@github.com:

When drawing pggg parameters I tend to get a limited (0.005 %) cluster of
cases with mean k's between 985 and 999, with absolutely no mean k
between 50 and 985 (Expected aggregate k is around 0.8).

The only thing seemingly setting these cases apart is that they're
somewhat regular but rather short-lived, they're probably somewhat
overrepresented and must be confusing the algorithm.

The upper bound for k slice sampling is set at 1000, which at first sight
I don't think is a realistic expectation in any scenario? I would suspect a
limit of around 100 to be safer and would adjust the algorithm accordingly.

Speaking of assumptions, it occurred to me that k's aggregate
distribution is more likely to follow a lognormal than a gamma
distribution. Even in the clumpiest of scenarios, extremely low k's remain
less likely than values around 0.5, with a few higher k cases always
remaining quite likely, a situation the gamma distribution doesn't allow
for.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#50, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMwOTYzC6lcrUU8I2yYvBZnXGSP7H_3ks5q_bzggaJpZM4K2kMF
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants