-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BetaGeoBetaBinomModel
#188
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments to begin with!
frequency: array_like | ||
The number of purchases of customers. | ||
recency: array_like | ||
The time of the last, i.e. xth, purchase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can perhaps make a point that recency
is discrete, i.e. int
-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that docstring still requires revision. Could be a good idea to add a validation check for this as well.
""" | ||
|
||
_model_name = "BG/BB" # Beta-Geometric Beta-Binomial Model | ||
_params = ["alpha", "beta", "gamma", "delta"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yes #182, sorry. And sounds good.
I see that the logp
utilizes scan
, which I'm not super familiar with. I see that @ricardoV94 is giving some good pointers below
"ContNonContract", | ||
"ParetoNBD", | ||
] | ||
__all__ = ["ContContract", "ContNonContract", "ParetoNBD", "BetaGeoBetaBinom"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add a comma after the last item black will let it stay one item per line, which I think is better for git diffs
def _B_iter(ix): | ||
return betaln(alpha + x, beta + t_x - x + ix) + betaln( | ||
gamma + 1, delta + t_x + ix | ||
) | ||
|
||
iter_result, _ = scan( | ||
fn=_B_iter, | ||
# outputs_info=pt.zeros(1), | ||
sequences=pt.arange(t_recent), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand what you're trying to do, it should be something like:
def _B_iter(ix): | |
return betaln(alpha + x, beta + t_x - x + ix) + betaln( | |
gamma + 1, delta + t_x + ix | |
) | |
iter_result, _ = scan( | |
fn=_B_iter, | |
# outputs_info=pt.zeros(1), | |
sequences=pt.arange(t_recent), | |
) | |
def _B_iter(ix, alpha, beta, t_x, x, gamma, delta): | |
return betaln(alpha + x, beta + t_x - x + ix) + betaln( | |
gamma + 1, delta + t_x + ix | |
) | |
iter_result, _ = scan( | |
fn=_B_iter, | |
outputs_info=[None], | |
sequences=pt.arange(t_recent), | |
non_sequences=[alpha, beta, t_x, x, gamma, delta], | |
) |
outputs_info=[None]
tells Pytensor it expects a single non-recurrent output
non_sequences
passes the constants explicitly to the inner function, which is a bit more clean imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that your scan can be replaced by relying on broadcasting.
def _B_iter(ix): | |
return betaln(alpha + x, beta + t_x - x + ix) + betaln( | |
gamma + 1, delta + t_x + ix | |
) | |
iter_result, _ = scan( | |
fn=_B_iter, | |
# outputs_info=pt.zeros(1), | |
sequences=pt.arange(t_recent), | |
) | |
n_dims_expr = max([alpha.ndim, beta.ndim, x.ndim, t_x.ndim, gamma.ndim, delta.ndim, t_recent.ndim]) | |
extra_dims_needed = (n_dims_expr - t_recent.ndim) + 1 | |
ixs = pt.shape_padright(pt.arange(t_recent), extra_dims_needed) | |
iter_result = ( | |
betaln(alpha + x, beta + t_x - x + ixs) | |
+ betaln(gamma + 1, delta + t_x + ixs) | |
) |
I've decided to split this into two PRs - one for the |
This PR closes #176
The
BetaGeoBetaBinomModel
is the only model I'm aware of for discrete, non-contractual modeling. This PR adds a model class as well as a testing dataset and distribution block.For now this is a draft PR because many of the tests are blocked by incorrect
logp
calculations; for reference please see (5) on p4 of the research paper:https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf
I don't think I'm using
pytensor.scan()
properly in thelogp
expression, so if someone could help me out with that I'd appreciate.Some other concerns:
purchase_heterogeniety
andchurn_heterogeniety
variables for plotting the population purchase and churn rate posterior distributions, but since they won't be called until after the model is fit, they are failingflake8
.