-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to analyse Regression Kink Analysis Designs #264
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## main #264 +/- ##
==========================================
+ Coverage 74.41% 75.91% +1.50%
==========================================
Files 20 20
Lines 1157 1308 +151
==========================================
+ Hits 861 993 +132
- Misses 296 315 +19
|
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:48Z This is nice. Good demonstration of the idea. |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:49Z Add a comment above the code where you specify the betas to indicate that this is where 2 is set as the gradient at kink point. drbenvincent commented on 2023-11-06T10:25:37Z Done |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:49Z I like this! Didn't realise |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:50Z Nice! |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:51Z Same as above add comment to the code block specifying where you fix the gradient to be recovered drbenvincent commented on 2023-11-06T10:52:41Z As far as I can tell, when we use a quadratic, none of the individual parameters corresponds to the change in gradient at the kink point, so I've relied on calculating that numerically. But I have added a comment in about how the betas translate into different equations on the left and right of the kink point.
|
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:51Z Is the forest plot redundant if you're doing the plor posterior too? drbenvincent commented on 2023-11-06T10:55:55Z Yep, I've removed the forest plot |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:52Z Maybe be a bit more explicit that this is not a new data set but that you're just showing basis splines applied to the last example and intending to recover the same parameters again. drbenvincent commented on 2023-11-06T10:59:04Z done |
View / edit / reply to this conversation on ReviewNB NathanielF commented on 2023-11-05T10:32:53Z Can you round the -1.14 to 1.1 to show the same value is being recoverd in the posterior plot and your more customised timeseries one...? Just a nit. drbenvincent commented on 2023-11-06T11:07:33Z Sure. What I've actually done is to show 2 decimal places in the Arviz plot. At the moment, rounding to 1 decimal place in the plot method would require doing that for everything. So I'm just about to add an issue to add a round_to kwarg to the plot methods in causalpy. |
Notebook and experiment design look great. Added a few comments. Mostly nits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Feel free to change/address the comments above as you like. Mostly they were small points of clarification. Major work looks great. Cool addition.
{self.running_variable_name: xi, "treated": self._is_treated(xi)} | ||
) | ||
# self.x_pred = pd.DataFrame({self.running_variable_name: xi}) | ||
(new_x,) = build_design_matrices([self._x_design_info], self.x_pred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there. is a reason but why is the output wrapped in braces?and then immediately into an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_design_matrices
returns both the x and y design matrices as a tuple, so we are just grabbing the x matrix here. And by default these are as dataframes. So you can either provide the return_type='matrix'
argument to build_design_matrices
, or manually do it yourself with np.asarray
as I did here.
y = reg_kink_function(x, beta, kink) + rng.normal(0, sigma, N) | ||
df = pd.DataFrame({"x": x, "y": y, "treated": x >= kink}) | ||
# run experiment | ||
result = cp.pymc_experiments.RegressionKink( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of neat that the kink model calls the linear model!
|
||
|
||
@pytest.mark.integration | ||
def test_rkink_bandwidth(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have an example in the notebook with the bandwidth parameter? Maybe worth adding/explaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
A PyMC model | ||
:param running_variable_name: | ||
The name of the predictor variable that the kink_point is based upon | ||
:param epsilon: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you suggested not having the notebook as a place for explaining the theory of kink designs, but i feel like the differences between tweaking at least one of epsilon/bandwidth parameters could be mentioned or shown.
It's fine if tweaking them isn't needed for your example, but it'd be good to hint at why they are there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an example to demonstrate use of the bandwidth
parameter. And I've added an admonition box to explain what epsilon does.
Done View entire conversation on ReviewNB |
As far as I can tell, when we use a quadratic, none of the individual parameters corresponds to the change in gradient at the kink point, so I've relied on calculating that numerically. But I have added a comment in about how the betas translate into different equations on the left and right of the kink point.
View entire conversation on ReviewNB |
Yep, I've removed the forest plot View entire conversation on ReviewNB |
done View entire conversation on ReviewNB |
Sure. What I've actually done is to show 2 decimal places in the Arviz plot. At the moment, rounding to 1 decimal place in the plot method would require doing that for everything. So I'm just about to add an issue to add a round_to kwarg to the plot methods in causalpy. View entire conversation on ReviewNB |
Thanks for the feedback @NathanielF. I think I've dealt with all the suggestions. I've also added the regression kink designs to the README.md and index.rst (for readthedocs) which I forgot to do before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments on the core component code style to make the computations more readable (and testable)
Thanks for the review @juanitorduz. I've made a stab at modularising the change in gradient calculation code and added some tests. Though I'm not sure if you it hits what you were thinking of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much nicer! thanks for the refactor. Letf a single comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition! Thank you @drbenvincent 🙌
This is my attempt to implement analysis methods for the Regression Kink Design. Happy to get any feedback on the work so far
RegressionKink
classRegressionKink
classRefactor. At the moment this whole class is copied/pasted from theWill create an issue about this because it gets into wider refactoring issues.RegressionDiscontinuity
class, so there is a lot of duplication. The main substantive difference is that we evaluate the function so as to evaluate the change in slopes either side of the kink.summary
methodCloses #223
NOTE: The example notebook does not try to act like an intro/explainer into regression kink designs. It assumes the reader has some background knowledge about what it is and when you'd use it. Happy to get input on whether it should have more introductory info, but there's a Discussion #262 on whether we should add a specific explainer section to the docs which would allow the notebooks to remain concise.