Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiD: allow for multiple pre and post intervention observations #76

Closed
9 of 15 tasks
drbenvincent opened this issue Nov 19, 2022 · 10 comments · Fixed by #140
Closed
9 of 15 tasks

DiD: allow for multiple pre and post intervention observations #76

drbenvincent opened this issue Nov 19, 2022 · 10 comments · Fixed by #140
Labels
enhancement New feature or request outputs Quantitative outputs of the model plotting Improve or fix plotting

Comments

@drbenvincent
Copy link
Collaborator

drbenvincent commented Nov 19, 2022

At the moment the code works but it assumes there are observations at one pre and one post intervention time.

So we need to relax this assumption and generalise the code.

Probably best to do this while working through the 'bank failure' dataset, see #44.

This will also impact plotting

And it will impact reporting of the summery stats.

Bank failure dataset/example + robustifying

  • add data
  • data processing + visualisation
  • need to add treated and units columns
  • check if I need to either specify the grouping column (e.g. district) as categorical OR enter that as C(district) in the model formula

Classic 2x2 DID

  • Analysis 1 with just pre and post (Bayesian)
    • model fit working
    • plotting working
  • Implement example for the frequentist approach

'Extended' DID with more than 2 observed time points

  • Analysis 2 with all data
    • Consider if we need to have separate classes for classic 2x2 DID and another for more time points, or try to make the one DifferenceInDifferences class deal with all situations.
    • model fit working
    • plotting working
  • Make it work for the frequentist model
  • Update tests
  • Update README + index.rst
@drbenvincent drbenvincent added enhancement New feature or request plotting Improve or fix plotting outputs Quantitative outputs of the model labels Nov 19, 2022
drbenvincent added a commit that referenced this issue Nov 19, 2022
@drbenvincent
Copy link
Collaborator Author

Model formula will be bib ~ 1 + district + year + district:treated

@drbenvincent
Copy link
Collaborator Author

At this point it seems that model fitting does work, although we'll have to wait for the visualisation to see if the results make sense.

Plotting is going to get more complex. The initial plot that I had only makes sense when we have two time points (pre/post) AND we have multiple observed units per group.

So we've potentially got a 2*2 grid of different plot types to produce

pre/post only multiple time points
one unit observed per group
multiple units per group existing plot

drbenvincent added a commit that referenced this issue Nov 19, 2022
@drbenvincent
Copy link
Collaborator Author

drbenvincent commented Nov 19, 2022

Screenshot 2022-11-19 at 21 31 32

  • fix green arrow for causal impact
  • fix plotting of data

@drbenvincent
Copy link
Collaborator Author

The magnitude of the causal impact is wrong. I think this might be fixed by enforcing an order on the levels of the groups.

@drbenvincent
Copy link
Collaborator Author

drbenvincent commented Nov 20, 2022

Plotting now works for the original dataset (multiple units in the treatment and control conditions) and the new banking example (one unit in the treatment and one in the control condition).

Although the aesthetics could do with some work, the priority is to focus on getting it working in the case where we have more observations over time, not just the pre/post times.

And the inferences are particularly bad. But this is because we are just using whatever default priors at the moment and because we only have one observation per condition. This is not an ideal scenario as a lot rides on the sigma parameter. But this can be worked on when we use Bambi (see #22).

Original dataset
Screenshot 2022-11-20 at 12 31 55

Banking dataset
Screenshot 2022-11-20 at 12 32 17

drbenvincent added a commit that referenced this issue Nov 20, 2022
@drbenvincent
Copy link
Collaborator Author

Currently sampling from the posterior works with

result = DifferenceInDifferences(
    df_long,
    formula="bib ~ 1 + district + year + district:treated",
    time_variable_name="year",
    group_variable_name="district",
    treated="Sixth District",
    untreated="Eighth District",
    prediction_model=LinearRegression()
    )

But breaks when we get to doing the other stuff and expected. This is the next step.

@drbenvincent
Copy link
Collaborator Author

Currently working in the did_multiple_observations branch.

@drbenvincent
Copy link
Collaborator Author

At this point we have DiD working for:

  1. Single pre and post treatment observation (although we have some shape issues, calculating posterior predictions and counterfactuals for each item)

Screenshot 2022-12-25 at 21 29 46

But there are clearly some issues to be resolved for the banks dataset.
Screenshot 2022-12-25 at 21 30 39

  1. Multiple pre and post treatment observations.

Looks like this for the full banks dataset

Screenshot 2022-12-25 at 21 31 28

@drbenvincent
Copy link
Collaborator Author

In the banks dataset, we are getting 1 surplus degree of freedom. I think it is because the group is coded as a string (therefore treated as a category) rather than as a numerical 0/1.

@drbenvincent
Copy link
Collaborator Author

I've made meaningful improvements to DiD at this point. There were a number of things about the code before which were muddled and a bit wrong. I've fixed those up, made the code cleaner, got DiD working for multiple pre and post treatment observations, and improved the plotting.

notebooks_did_pymc_banks_12_0
notebooks_did_pymc_banks_17_0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request outputs Quantitative outputs of the model plotting Improve or fix plotting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant