Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbose logging in LinearDML and SparseLinearDML #922

Open
carl-offerfit opened this issue Oct 9, 2024 · 3 comments
Open

Verbose logging in LinearDML and SparseLinearDML #922

carl-offerfit opened this issue Oct 9, 2024 · 3 comments

Comments

@carl-offerfit
Copy link

I'm trying to fit LinearDML and SparseLinearDML on a marketing data set with 350,000 examples, 25 real valued treatment variables, and 50 nuisance variables. The fitting takes a long time (and eventually gives warnings - more about that in another issue if I can't figure it out). For now my question is if there is any way to get logging? I understand there are multiple stages in the fitting process (multiple models to fit), and I don't know what takes so long. LinearDML and SparseLinearDML models don't seem to accept the verbose parameters listed in the DML base class. Thanks for your help!

@carl-offerfit
Copy link
Author

I stepped through the code and answered my own question: I can see there is no additional info available. I am going to work on enhancing the logging to meet my needs, and if it works out I can contribute to the project. I am dealing with non-convergence of the final model and I will most likely start with logging more diagnostics on the quality of the fit in the nuisance and treatment models.

@kbattocchi
Copy link
Collaborator

Hi Carl, I agree that keeping track of the progress through fitting is functionality that we should build into the library and have started working on that.

In terms of the quality of the fit, once fitting is complete one measure of that is accessible via nuisance_scores_t and nuisance_scores_y attributes, but note that interpretation of these can be somewhat subtle (you want your model to fit as well as possible, but for the DML techniques to work there needs to be unpredicatble variation in the treatment, which should lead to unpredictable variation in the outcome if the effect is nonzero).

@carl-offerfit
Copy link
Author

Hi, thanks for your reply. I was thinking about making a branch and adding more detailed and familiar diagnostics on the first stage models. For example, a confusion matrix on the first stage outcome model (in case it is binary) or other intuitive evaluation metrics like correlation coefficient (for real valued outcomes or treatments); also I would like to look at SHAP plots for the first stage models (I have a good idea what the influential features should be, and if the model did not find them I would know something is wrong) etc. Does that make sense? And if I made such features with a flag to turn them on/off is that something other people would find useful?

There is lots of unpredictable variation in the treatments - we pick the treatments with contextual bandits which are refit multiple times per week on noisy data, plus we always have a small percent of completely random treatment assignment to help learning (we are using a version of epsilon greedy contextual bandits.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants