Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BetaGeoBetaBinomModel #188

Closed
wants to merge 16 commits into from
Closed

Add BetaGeoBetaBinomModel #188

wants to merge 16 commits into from

Conversation

ColtAllen
Copy link
Collaborator

This PR closes #176

The BetaGeoBetaBinomModel is the only model I'm aware of for discrete, non-contractual modeling. This PR adds a model class as well as a testing dataset and distribution block.

For now this is a draft PR because many of the tests are blocked by incorrect logp calculations; for reference please see (5) on p4 of the research paper:
https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf

I don't think I'm using pytensor.scan() properly in the logp expression, so if someone could help me out with that I'd appreciate.

Some other concerns:

  1. I added purchase_heterogeniety and churn_heterogeniety variables for plotting the population purchase and churn rate posterior distributions, but since they won't be called until after the model is fit, they are failing flake8.
  2. Docstrings still require revision. How prudent is it to add the LaTeX math equations? I'm not sure how useful it is for users of this library, and a weblink to the research paper referencing the same expressions is already provided.

@ColtAllen ColtAllen added enhancement New feature or request help wanted Extra attention is needed CLV labels Mar 5, 2023
@ColtAllen ColtAllen self-assigned this Mar 5, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@larryshamalama larryshamalama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments to begin with!

frequency: array_like
The number of purchases of customers.
recency: array_like
The time of the last, i.e. xth, purchase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can perhaps make a point that recency is discrete, i.e. int-like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that docstring still requires revision. Could be a good idea to add a validation check for this as well.

"""

_model_name = "BG/BB" # Beta-Geometric Beta-Binomial Model
_params = ["alpha", "beta", "gamma", "delta"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition is related to #128 IIRC. Is that correct? I believe that the existing models don't have this _params structure but it could be a nice way of closing #128 with additional methods to the base CLVModel class. Was what you had in mind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#182 you mean? I wanted to get draft PRs for this and #177 up before opening a PR for that one. After considering #197, some major test restructurings could be underway, so I think I'll start work on #182 next week.

In the meantime, I could use some help with logp in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yes #182, sorry. And sounds good.

I see that the logp utilizes scan, which I'm not super familiar with. I see that @ricardoV94 is giving some good pointers below

"ContNonContract",
"ParetoNBD",
]
__all__ = ["ContContract", "ContNonContract", "ParetoNBD", "BetaGeoBetaBinom"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add a comma after the last item black will let it stay one item per line, which I think is better for git diffs

Comment on lines +625 to +634
def _B_iter(ix):
return betaln(alpha + x, beta + t_x - x + ix) + betaln(
gamma + 1, delta + t_x + ix
)

iter_result, _ = scan(
fn=_B_iter,
# outputs_info=pt.zeros(1),
sequences=pt.arange(t_recent),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand what you're trying to do, it should be something like:

Suggested change
def _B_iter(ix):
return betaln(alpha + x, beta + t_x - x + ix) + betaln(
gamma + 1, delta + t_x + ix
)
iter_result, _ = scan(
fn=_B_iter,
# outputs_info=pt.zeros(1),
sequences=pt.arange(t_recent),
)
def _B_iter(ix, alpha, beta, t_x, x, gamma, delta):
return betaln(alpha + x, beta + t_x - x + ix) + betaln(
gamma + 1, delta + t_x + ix
)
iter_result, _ = scan(
fn=_B_iter,
outputs_info=[None],
sequences=pt.arange(t_recent),
non_sequences=[alpha, beta, t_x, x, gamma, delta],
)

outputs_info=[None] tells Pytensor it expects a single non-recurrent output
non_sequences passes the constants explicitly to the inner function, which is a bit more clean imho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that your scan can be replaced by relying on broadcasting.

Suggested change
def _B_iter(ix):
return betaln(alpha + x, beta + t_x - x + ix) + betaln(
gamma + 1, delta + t_x + ix
)
iter_result, _ = scan(
fn=_B_iter,
# outputs_info=pt.zeros(1),
sequences=pt.arange(t_recent),
)
n_dims_expr = max([alpha.ndim, beta.ndim, x.ndim, t_x.ndim, gamma.ndim, delta.ndim, t_recent.ndim])
extra_dims_needed = (n_dims_expr - t_recent.ndim) + 1
ixs = pt.shape_padright(pt.arange(t_recent), extra_dims_needed)
iter_result = (
betaln(alpha + x, beta + t_x - x + ixs)
+ betaln(gamma + 1, delta + t_x + ixs)
)

@ColtAllen
Copy link
Collaborator Author

I've decided to split this into two PRs - one for the BetaGeoBetaBinom distribution block, then another for BetaGeoBetaBinomModel - so that it's more digestable. I've dive into this further after #177 is closed.

@ColtAllen
Copy link
Collaborator Author

Closing this one for now. Will resume work and open another PR after #134, #298, and #182 are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add BetaGeoBetaBinomModel
3 participants