-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLV API Standardization #527
Comments
I think the best would be to work with xarray datasets. It has the organization benefits of pandas, with the broadcasting behavior of numpy. Internally most predictive methods are already written with xarray code anyway. Users could pass pandas dataframes and we convert to xarray, but the default type which needs to conversion would be xarrays. Definitely agree that passing separate numpy arrays is too cumbersome |
Updated original comment with list of PRs to complete. |
Is t / future_t meant to be vector / vectorized in the API? I think previous implementation had it as vector of same size as each other input or scalar |
Both forms of parametrization (vectorized or scalar) are supported: # scalar parametrization (here predictions are being ran for in-sample data)
model.expected_purchases(future_t=10)
# equivalent vectorized parametrization
data = data.assign(future_t=10)
model.expected_purchases(data) Vectorization support was added to facilitate |
Steps 4-6 (along with adding CLV support for |
There are API inconsistencies in the CLV module. Standardization is a big task best broken down into multiple PRs:
PRs to be Completed In-Order
BetaGeoModel
predictive methods, retain legacyexpected_num_purchases
methodclv.plotting
module so it no longer referencesexpected_num_purchases
GammaGammaModel
andclv.utils.customer_lifetime_value
(related to Removefuture_t=0
predictions fromexpected_customer_lifetime_value
function #596)BetaGeoModel.expected_num_purchases
andBetaGeoModel.expected_num_purchases_new_customer
BetaGeoNBD
distribution block toclv.distributions
(related to CLV Distribution RVs not Model-Specific #128)BetaGeoModel
Current API
Beta-Geo/NBD Transactions Model
Note how fit data is provided as a dataframe, but in the predictive methods individual arrays must be provided. Specifying one array at a time was one of the most annoying aspects about using the legacy
lifetimes
library, and sometimes even created indexing issues that caused the underlyingscipy
functions to crash.For
ParetoNBDModel
, I streamlined this nonsense with a dataframe argument, and made it optional if running predictions on the fit dataset:Pareto/NBD Transactions Model
(We will also need to resolve the naming inconsistencies between these models.)
I've been told passing in dataframes instead of arrays loses some
xarray
broadcasting functionality, which I'd be interested to hear more about. I'm not opposed to arrays being passed in provided it's optional for in-sample data.The API discrepancies between these models necessitated a hotfix for the monetary value model, which follows the same conventions as
BetaGeoModel
:Gamma/Gamma Monetary Value Model
Lastly,
ShiftedBetaGeoModelIndividual
is a whole different animal since it handles contractual transactions, but I think it'd be a good idea to add support for it to thecustomer_lifetime_value
utility:Shifted Beta-Geo Contractual Model
The text was updated successfully, but these errors were encountered: