-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Poisson likelihood for catch with negative binomial #3
Comments
My contribution is to make the issue title informative. |
I agree that it would be good for @marksorel8 to take a crack at the NB formulation. |
And I just added an informative label to the issue! |
Good job, team! |
Alright, here goes.
The Stan function Since the Stan parameterization is similar to the standard parameterization in the part with p, would it then be closed to addition as long as we fit a single beta (and therefore p) parameter for each year? alpha = expected value * beta. So, can we just replace
with |
If you want
|
@mdscheuerell , would that make the distribution not closed to addition, because the betas are different? The closed to addition thing isn't a problem in the Wenatchee because they are legally obligated to check the traps daily, but it seems like it is important to crack, especially since it will be necessary in other places like the Tucannon. |
I don't understand the issue re: alpha and beta in the negative binomial complicating the estimation. The mean (and variance) for the Poisson ( |
There are two distinct steps in the likelihood calculation where going from the Poisson to the NB presents complications. The one @marksorel8 is referring to is the summation of "true" migrants over multiple days, if the trap isn't emptied daily (though apparently that's a non-issue in the Wenatchee). This is implicitly a summation of discrete-valued latent states, but when they're Poisson-distributed it's trivial b/c you can just sum the expectations and it's still Poisson. With the NB, this doesn't work -- the sum of NBs is NB iff they have the same p, in which case you sum the r's (in the standard parameterization). So you'll have to (1) declare p (or a one-to-one transformation of it) as a primitive parameter, which is less than ideal for interpretation and for characterizing the overdispersion; (2) after calculating the daily expectation The second issue is the binomial thinning in calculating |
I had made the assumption that we had a distributional form for the expected catch for each trap, which was initially Poisson. The expectation (and variance) for the Poisson comes from a Gaussian AR(1) model for log-migrants, which is then thinned at some interval based upon when the trap is sampled. I had thought |
What I had in mind in formulating the model was that it's actually the daily arrivals at the trap that are Poisson. The daily outmigrants available for capture are a Poisson-log AR(1) process, which are then possibly accumulated over multiple days and thinned by binomial sampling with trap efficiency p (uhh, the other p...that's confusing). In JAGS, you would model the daily arrivals explicitly: M[t] ~ dpois(M_hat[t]) We can't do that in Stan, but luckily the Poisson is closed under addition and binomial thinning, so everything works out and we get to have our discrete-latent-state cake and eat it too (in Things aren't quite as elegant if the daily arrivals are NB. It's still doable as outlined above, but I suspect having to parameterize in terms of p (vs. overdispersion) induces a wonkier prior that's harder to interpret (since p is involved in both the mean and overdispersion). |
I agree with this. |
FYI, I did something similar for the San Nicolas data wherein the expectation for the NB was based on the log-abundance from a MAR(1) model, but the observations weren't nearly as sparse (ie, NA's were very rare), so no summations were necessary and solving for r wasn't such a hassle. |
@marksorel8, you could explore the difference b/w the two parameterizations with the Wenatchee data, since summation over missing days isn't an issue. |
Hi @mdscheuerell @ebuhle, I have started adding the ability to use the NB distribution for the observations in the single year Stan model. I am using the mu and phi parameterization of the NB.
Would you be able to check the changes that I made in juv_trap.stan in the last commit and let me know if I am on the right track? What would an appropriate prior on the phi parameter in the NB be?
I am not getting radically different results with the NB than the Poisson at this point. Should I us LOO to compare them once I have a descent prior on phi? How would I go about that? Finally, I found an e-book that might be of use in this project. Weisse- An Introduction to Discrete-Valued Time Series.pdf |
@marksorel8 The changes you made are not sufficient because you're using the primitive parameter
|
Re: models for discrete-valued data, you could opt to model the true, but unknown mean number of migrants as an AR(1) process for discrete values, but it's just as reasonable to assume the log-mean is actually real-valued (Gaussian), which is just fine for the Poisson or NB. |
@mdscheuerell pretty much covered it. The key point is that you can't use the Wenatchee-screw-traps/src/Stan_demo/juv_trap.stan Lines 59 to 60 in 6d9a311
because the NB is only closed under addition if all the terms are distributed with the same p (i.e., in Stan's However, if you don't do the summation (which you can get away with in the Wenatchee case) then either parameterization is fine, and this would allow you to compare the two. As for a prior on the overdispersion, I'd suggest a half-normal or half-Cauchy, diffuse enough to cover plausible values (recalling that smaller |
Oh, and just to clarify:
That's a different p -- the weekly capture probability of the trap, not the binomial probability underlying the "number of failures before first success" interpretation of the NB -- and it's fine as is. Not confusing at all... |
Thanks @mdscheuerell and @ebuhle . I added a prior on the dispersion parameter (actually, I put the prior on the inverse of the square root of the dispersion parameter) and played around with fitting a few different years of the Chiwawa data with the NB and the Poisson. For one thing, I am getting some divergences with the NB. Beyond that, the degree of overdispersion depends on the year. In some years it is negligible and in others it is more substantial. The year shown below is one with more overdispersion. |
Correct me if I'm wrong, @marksorel8, but it looks to me like you still haven't changed the Stan code to address the proper thinning of the NB per our discussion above (i.e., you are still summing over weeks with different |
You are correct that I haven't changed the Stan code, @mdscheuerell, but I am not summing over weeks with different Addressing the proper thinning of the NB is the next step, but I wanted to compare the NB and the Poisson with the Wenatchee data first because it was simple. Looks like the NB is a big improvement for sure. So, good idea! |
Except the code still includes the summation, which the Wenatchee data doesn't need (b/c The simplest solution in this case would be to make a separate version of |
Here I tried to implement the first parameterization of the NB in Stan ( I'll work on reverting the original single-year Poisson model and adding a separate |
It sounds like you want |
This is great @ebuhle , thank you! I agree that there there should be different |
We now have three different single-year Stan models using the three different observation likeliehoods: Poisson, Negative Binomial (alpha, beta), and Negative Binomial (mu, phi). I have lots of questions moving forward :)
|
A few quick thoughts:
LOO!
Sure, pending answer to (1).
My vote is no, especially if we're still contemplating sharing some observation-related parameters among years.
Seems like the main thrust of a methods-oriented paper would be to compare this to existing approaches, in particular BTSPAS. Whether that's worth doing depends largely on your level of interest, but also I don't think our model is quite up to the task yet; BTSPAS does several things (esp. related to the efficiency trials) that ours doesn't. |
sounds good...but looks hard. Are you available to sit down and talk about how to implement this? Or do you have an example? Would we do LOO for the catch data only? |
Really, even after reading the vignettes? The theory is subtle (though not that much worse than AIC / DIC / *IC, except for the computational part) but the loo package makes it dead simple in practice. The only change needed to the Stan code would be a few lines in
While that's not necessarily wrong (there are situations where you're only interested in the predictive skill of one part of a model, or for one type of data), here I'd recommend using the total likelihood. |
Ok, I'll give it a try, since it sounds like you think I can do it. I got thrown off by some of the more complicated examples in the vignette.
This sounds like a log_lik vector with a pointwise log-likelihood value for each data point (so length of data set #1 + length of data set # 2). Let me know if I'm on the wrong track here. I'll give this a shot though :) |
Ultimately, yeah. For readability's sake I'd probably declare two vectors, generated quantities {
vector[N_MR] LL_MR;
vector[N_trap] LL_trap;
for(i in 1:N_MR) {
// evaluate mark-recap log-likelihood for obs i
}
for(i in 1:N_trap) {
// evaluate catch log-likelihood for obs i
}
} Then you could either concatenate them into one long vector, or monitor them both and |
I attempted to compare the Poisson and NB for the 2001 Tucannon data. I got the following warnings after running the NB model.
Not sure what the deal with the undefined values is. The low ESS's were for the process error sigma and the NB overdispersion p, which would logically be negatively correlated. R-hats were still 1.01 for those parameters after 1500 iterations though. When I ran loo for the Poisson model I got the following warnings:
and this output:
They were better for the NB model, but still not great.
|
I took a quick glance at your code and you're not asking for the pointwise likelihoods of the data (catch), but rather an estimated state (migrants), so these comparisons via |
Hi @mdscheuerell , I forgot to push the changes that I made to the code last night. Doh! sorry. I am asking for the pointwise log_likelihood of the data with the following code, right?
apologies for not pushing the code I was referencing last night. |
Good Q, but that is only necessary if you're trying to do step-ahead forecasting, which is not our scenario. I still need to dig into the Tucannon results you posted above and understand where those numerical errors in the generated quantities I'm also not clear on why you were using Tucannon data for the model comparison, since that prohibits the use of the NB2 model. I thought the idea was to investigate the differences b/w the NB (awkward and nonintuitive, but usable with gappy data) and NB2 (more appealing, but unusable when gaps are present) parameterizations using data that can accommodate both. |
Yep, not sure which of these two reasons caused the errors, but I agree those seem like the likely culprits. I see that for Also, good Q why I fit the Tucannon data. I guess I thought it might be interesting because I assumed it had MR data for most weeks. I've been playing around with the Chiwawa data too. Lot's of bad Pareto K diagnostic values, but if loo_compare is to be believed, it appears to prefer the Poisson to either NB for 2013. I need to read up and try to better understand what LOO is doing so I can assess how serious those bad diagnostics are. |
Perhaps we would have better luck with K-fold than leave one out? |
Good idea. You will learn that those high Pareto-k values are (very) bad. Basically they mean that the corresponding pointwise likelihoods are so long-tailed that their importance sampling distribution cannot be fit by the Pareto smoothing distribution -- they may not have finite variance or even finite mean -- and therefore the approximation to the leave-one-out predictive density underlying LOO is invalid. This isn't too surprising, TBH. I expect @mdscheuerell would agree that ecological models in the wild have bad (or very bad) Pareto ks more often than not. The loo package provides some nice diagnostic plots that can help you understand which observations or data types (e.g., catch vs. recaptures) have these extremely skewed likelihoods. In principle, the solution is to do brute-force cross-validation for the problematic observations. In practice, and depending on how much you care about the answer, re-fitting the model that many times may or may not be worth it. An alternative, as you suggest, is to abandon LOO altogether and do K-fold CV, which is still a heckuva lot more work than LOO. (Just to be clear, brute-force leave-one-out is a special case of brute-force K-fold CV with K = N. What you've been doing so far is neither of these: LOO is an approximation to the leave-one-out posterior predictive density. In that narrow sense it is analogous to AIC, DIC, WAIC, etc.) In any event, I think we have to root out what is giving those numerical errors when converting from (mu,q) to (a,b) (again, using my proposed notation). If the NB RNG is throwing errors, presumably the likelihood is unstable as well, which could certainly have something to do w/ the extreme skewness. I'm also not crazy about the prior on the inverse square-root of the overdispersion parameter in the NB2 formulation, but that's another post. |
From the info on What do we do if the likelihood is unstable? Can we constrain it with priors? Or is there something about the model itself that would need to be changed? Thanks! |
Wenatchee-screw-traps/src/Stan_demo/juv_trap_NB.stan Lines 101 to 105 in 63081df
Assuming you're already monitoring Like I mentioned yesterday, if you're going to experiment with this I'd suggest saving the random seed so we can reproduce the errors. |
Ok, just added a reproducible example, with the seed set for the Stan models. I also added a model using that Are there tests that we should conduct to evaluate the stability of the likelihood (e.g. evaluating the effect of seed on result)? For comparing observation models (poisson vs. NB1 vs NB2), it occurs to me that we should probably use the full multi-year model. I envision this model having independent |
@marksorel8 Can you produce some violin plots of the pointwise likelihoods? IME, they will give you an indication of where, exactly, the long tails are arising and giving the lousy pareto-K estimates. For example, here is a case from our Skagit steelhead model where
It's pretty obvious that there are 2 different contributions to the overall likelihood, and that the first (age composition) is behaving pretty badly compared to the second (escapement). |
Hi @mdscheuerell , the first plot below is of the pointwise likelihoods of the 15 mark-recapture efficiency trials for the Chiwawa in 2013 and the second plot is for 50 of the 252 catch observations. Looks like there are long tails in the pointwise likelihoods for all the data points. For the actual analysis I will use a multi-year model, which combines information from mark-recapture trials ac cross years. It would be great if the multi-year model behaved better, but that may be wishful thinking? |
FYI, @marksorel8, I'm currently investigating your reprex on the numerical errors in
The actual errors printed in the Viewer pane are like this:
This refers to the method of generating pseudo-random NB variates using the gamma-Poisson mixture. AFAICT, either of the NB parameters ( Stay tuned... |
Aha! I think at least part of the reason I'm having such a hard time reproducing your reprex is because you didn't set the RNG seed for R, thus the inits will be different from one call to the next. I must've gotten "lucky" the first time or two yesterday. Will just have to continue with trial-and-error (as in, keep trying until I get an error). |
Oh man, I'm sorry @ebuhle. At first I only set the RNG in R and that didn't work, so I switched to setting it in Stan only, which i thought was enough. SMH. I think my R seed may have been 10403. |
See #3 (comment); I found one that "works". Better still, these seeds give two chains that are fine and one (chain 3) that appears to throw the error at the initial values and every iteration thereafter. Nothing obvious pops out from comparing the inits, though. Will have to mess around with a |
An aside, but perhaps worth mentioning. Once upon a time, I had a JAGS model wherein a particular seed had the rather undesirable effect of allowing the MCMC chains to get X steps along (like 75%) before barfing every time at the same iteration. No reboot, update, downdate(?), etc made a difference. In this particular case, however, the behavior is much more pathological in that there a multiple pathways to the problem. Don't ask me how I know this, but setting really bad bounds on priors (eg, setting the lower bound on an SD prior to be much greater than the true value for the simulated data) can do wonders for the actual fitting process (eg, eliminates all divergent transitions, rapidly decreases run times), even if the answers are obviously wrong. So, perhaps it's worth changing the prior bound(s) for
Perhaps we should try [0.2,0.8] (or something else that's biased and more precise than the current range)? |
Haha, even I remember this; that's how annoying it was!
Not a bad idea, although I'd suggest something without the un-Gelmanian hard endpoints inside the feasible domain, like Beta(2,2). I do have the sneaking suspicion that |
OK, so a bit of Wenatchee-screw-traps/src/Stan_demo/juv_trap_NB.stan Lines 100 to 103 in 9035486
Typical output in cases that produce the error:
Compare that to, for example:
It's too bad the "canonical" priors that would regularize It's less obvious how to prevent
Anyway, I'm just leaving this update here for now; I'm out of time to spend on this today. |
Thank you so so much for looking into this @ebuhle. I will try some different priors and report back! |
I leave the exercise to the reader...errr, dissertator.
The tricky part will be making sure you're using a parameterization of the NB that is closed under addition, so that this still holds:
Wenatchee-screw-traps/src/Stan_demo/juv_trap_multiyear.stan
Line 75 in 14e67d4
As you probably know, there are several parameterizations in common use, and even the "standard" one in Stan differs from the one on Wikipedia, for example. Further, the most common and useful parameterization in ecology (mean and overdispersion) is yet another variant. The catch is that closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization), so you can't simply sum the expectations like we do with the Poisson.
Then it will also be slightly tricky to figure out the binomial thinning:
Wenatchee-screw-traps/src/Stan_demo/juv_trap_multiyear.stan
Line 76 in 14e67d4
since thinning the NB (like the Poisson) scales the expectation. So you'll have to switch between parameterizations (i.e., solve one set of parameters for the other) at least once.
I would have to sit down with pencil and paper to figure this out. Sounds like a good grad student project, amirite??
Originally posted by @ebuhle in #1 (comment)
The text was updated successfully, but these errors were encountered: