-
Notifications
You must be signed in to change notification settings - Fork 71
API change for the SyntheticControl
experiment class
#460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #460 +/- ##
==========================================
- Coverage 94.67% 94.40% -0.27%
==========================================
Files 32 29 -3
Lines 2196 2075 -121
==========================================
- Hits 2079 1959 -120
+ Misses 117 116 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a quick pass but looks broadly sensible. Just trying to understand how extensive you want to make the API change? Asked some questions but mostly clarifying for my own understanding.
I can see how the API change makes it easier to allow for a matrix of weights in the WeightedFitter.
My one questions was whether you wanted to allow multiple "equations" for the different treatment groups i.e. generate different synthetic controls based on a varying set of inputs. Because If so maybe patsy could still work.... but you'd have to more flexible in the PyMC model than in the API ....
# turn into xarray.DataArray's | ||
self.X = xr.DataArray( | ||
self.X, | ||
dims=["obs_ind", "coeffs"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dims of X as coeffs? Should surely be covariates... coeffs is the output, and while they should be 1:1 naming is a little misleading no?
self.model.fit(X=self.pre_X, y=self.pre_y, coords=COORDS) | ||
COORDS = { | ||
# key must stay as "coeffs" unless we can find a way to auto identify | ||
# the predictor dimension name. "coeffs" is assumed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok fair enough. I guess easier to be consistent across code base
:param control_units: | ||
A list of control units to be used in the experiment | ||
:param treated_units: | ||
A list of treated units to be used in the experiment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry. i forgot this was synthetic control. So treated as target makes sense. The idea would be to enable multiple treatment groups? Do you want to enable different "equations" for the different treatment units? Or would it be just the same fixed controls for each potential treatment unit?
Thanks for the review.
My original idea was to see if patsy allowed something like The rationale for going for a list of treated and a list of control units was to a) allow multiple treated units, b) allow the weights in the pymc model to change from a 1D vector to a 2D array. Crucially, if all treated units are predicted by the same set of control units then this would be a relatively trivial change to the shape of the Dirichlet distributed weights. I have to admit, I didn't think of your idea of providing a list of formulas. This is interesting:
|
Yes, i think you have the suggestion right. I think it potentially gives you more flexibility. I'm trying to do the same here: pymc-labs/pymc-marketing#1654 But it does kind of move the complexity into the model class as you have to parse and construct the relevant components. |
xarray.DataArrays
. This helps with the broadcasting. The model functions get varied input depending on the situation (e.g. experiment), so it was getting complicated. Xarray simplifies broadcasting in functions likePyMCModel.calculate_impact
. (A bunch of time was used trying different solutions before it became clear that the xarray approach was the easiest).0.x
, so people should expect the API to change until we reach1.x
.Remaining taks:
Resolve error belowUsing a linear regression model with synthetic control is not a planned use case. If it becomes important then we can deal with that at the time.Remaining bug, not captured by tests
In the scikit-learn synthetic control notebook, we are getting an error in the second part where we call
There is a broadcasting issue resulting in this:

So I need to think about if doing this (linear model with synthetic control experiment) even make sense. If it is, then I need to add a test to catch this error because it's not covered by tests currently, and fix it.
📚 Documentation preview 📚: https://causalpy--460.org.readthedocs.build/en/460/