Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
2d61f6e
add structure
OliverSchacht Aug 13, 2024
870ae77
Update requirements.txt
SvenKlaassen Aug 13, 2024
71eab06
create empty test files
SvenKlaassen Aug 13, 2024
dcade8b
fix init name
SvenKlaassen Aug 13, 2024
45bd380
create simple dataset for rdd
SvenKlaassen Aug 13, 2024
db0d6ca
first exception test
SvenKlaassen Aug 13, 2024
a4c3471
initial draft of RDD
OliverSchacht Aug 14, 2024
8071f05
add _check_data method to RDFlex
SvenKlaassen Aug 14, 2024
53e319a
add further test files
SvenKlaassen Aug 14, 2024
5a4df47
change s_col description and error messages
SvenKlaassen Aug 14, 2024
b48050d
add first learner tests to RDFlex
SvenKlaassen Aug 14, 2024
3897b03
add score and cutoff tests
SvenKlaassen Aug 14, 2024
2473162
add resampling test and properties
SvenKlaassen Aug 15, 2024
71edc60
remove old properties
SvenKlaassen Aug 15, 2024
fa92395
add checks and support for custom kernels
SvenKlaassen Aug 15, 2024
53f223b
adapt _calc_weights
SvenKlaassen Aug 15, 2024
ba7c996
add h_fs check
SvenKlaassen Aug 15, 2024
3464abb
fix initial h_fs and set private properties
SvenKlaassen Aug 15, 2024
87a9082
update rdd fit
SvenKlaassen Aug 15, 2024
3171ce3
update _check_fuzzyness
SvenKlaassen Aug 15, 2024
051cef0
add test for default values
OliverSchacht Aug 15, 2024
f59285b
doc: h_fs is float
OliverSchacht Aug 15, 2024
a30871a
add return types test file
OliverSchacht Aug 15, 2024
068e98d
update rdd
OliverSchacht Aug 15, 2024
fdc6192
fix column_stack and predict_proba
SvenKlaassen Aug 15, 2024
9aeb8a0
add iterative fitting
SvenKlaassen Aug 15, 2024
d7122bd
small update
OliverSchacht Aug 15, 2024
dd00de0
fix data gen for test
OliverSchacht Aug 15, 2024
8cf50a1
fix data gen in test
OliverSchacht Aug 15, 2024
656d95b
add property: cutoff
OliverSchacht Aug 15, 2024
b3ae1b9
fix: test for initial h_fs
OliverSchacht Aug 15, 2024
e30a27f
fix: w instead of weights
OliverSchacht Aug 15, 2024
9c45ce2
add fit n_iterations tests
SvenKlaassen Aug 15, 2024
e3af8fe
Merge branch 'main' of https://github.com/DoubleML/doubleml-rdflex
SvenKlaassen Aug 15, 2024
f2255bc
add resampling for iterations
SvenKlaassen Aug 15, 2024
b0a9324
fix: __str__() return error for unfit model
OliverSchacht Aug 15, 2024
e8385d5
add properties for coef se pval t_stat
OliverSchacht Aug 15, 2024
7738f1d
fix: str
OliverSchacht Aug 15, 2024
7e76113
rename coef and se; fix tmp_smpls
SvenKlaassen Aug 15, 2024
3bbfc36
change p_value to pval
SvenKlaassen Aug 15, 2024
db83df8
add requirement of sample weights to learners
SvenKlaassen Aug 16, 2024
dcfce65
update simple rdd dgp
SvenKlaassen Aug 16, 2024
8bc9319
update rdd dgp
SvenKlaassen Aug 16, 2024
60b779d
align naming in fit_nuisance_model
SvenKlaassen Aug 16, 2024
66d4d07
remove weight mask
OliverSchacht Aug 16, 2024
cacd08d
change M_Y and M_D to arrays
OliverSchacht Aug 16, 2024
22d6e2d
set default values to nan (instead of empty)
OliverSchacht Aug 16, 2024
bd3b5a5
fix __str__()
OliverSchacht Aug 16, 2024
2618efc
rename smpls
OliverSchacht Aug 16, 2024
9f68618
solution for it > 2
OliverSchacht Aug 16, 2024
f366fba
remove r_mask from test
OliverSchacht Aug 16, 2024
50fa117
fix format in __str()__
SvenKlaassen Aug 16, 2024
6bfee66
update iterativ fitting
SvenKlaassen Aug 16, 2024
e927788
check iterations seperately
SvenKlaassen Aug 16, 2024
205a9e1
rename loop and clean fit iterations
SvenKlaassen Aug 16, 2024
2c92e09
fix score in fit and weight update
SvenKlaassen Aug 16, 2024
8bb03f5
save value of final bandwidth
SvenKlaassen Aug 16, 2024
07b8707
add confint method
SvenKlaassen Aug 16, 2024
2731ae7
split _fit_rdd for sharp design
SvenKlaassen Aug 16, 2024
375dfb2
add sharp rdd test
SvenKlaassen Aug 16, 2024
27caddf
fix treatment definition in test_rdd_sharp
SvenKlaassen Aug 16, 2024
5eb2f81
add se to test_rdd_sharp.py
SvenKlaassen Aug 16, 2024
2389f03
feat: adde fuzzy parameter
Blacky-P Aug 20, 2024
a8bc32c
new simple dgp version
SvenKlaassen Aug 20, 2024
a59760a
Documentation for fuzzy
OliverSchacht Aug 20, 2024
fcf7f62
add warning for effect sign
OliverSchacht Aug 20, 2024
5b35205
make dgp effect cont. in score
SvenKlaassen Aug 20, 2024
7be9dd5
adjusted fuzzy flag default, checkmessage and docstring
Blacky-P Aug 20, 2024
837fff4
Merge branch 'main' of github.com:DoubleML/doubleml-rdflex
Blacky-P Aug 20, 2024
b48f254
rework of test start
Blacky-P Aug 20, 2024
7065307
set fuzzy default to false
SvenKlaassen Aug 20, 2024
0f50178
add fuzzy to default test
SvenKlaassen Aug 20, 2024
1aef14c
test fuzzy warning
SvenKlaassen Aug 20, 2024
b50f379
adjust warning message treatment assignment
SvenKlaassen Aug 20, 2024
48bfbac
add scopes to conftest
SvenKlaassen Aug 20, 2024
6b33495
add test for ci
SvenKlaassen Aug 21, 2024
4b4ff49
feat: rdd flex dummy tests
Blacky-P Aug 22, 2024
3d01fcd
fix: speedup rdd dummy tests
Blacky-P Aug 23, 2024
2cc0cc0
fix: removed todos
Blacky-P Aug 23, 2024
97d7f02
adjust fuzzy warning message
SvenKlaassen Sep 3, 2024
f4e2774
update docstrings
SvenKlaassen Sep 3, 2024
97f8893
define private properties in init
SvenKlaassen Sep 3, 2024
86d2c76
add fs_specification to rdd
SvenKlaassen Sep 3, 2024
3b8226e
update aggregation with scaling factor
OliverSchacht Sep 3, 2024
80d68e6
avoid warning
OliverSchacht Sep 3, 2024
5fa02cb
draft for classification test
OliverSchacht Sep 3, 2024
b8a7337
fix: shapes of prediction array
OliverSchacht Sep 4, 2024
e159f9b
include fs_specification test
OliverSchacht Sep 4, 2024
1118fcd
test for fs_specification
OliverSchacht Sep 4, 2024
f29d87f
extent unit tests for binary outcome
SvenKlaassen Sep 6, 2024
b59dcf9
update fs_specification docstring
SvenKlaassen Sep 6, 2024
6d9aa58
add global learners
OliverSchacht Sep 6, 2024
8c6e6f9
depreciate check
OliverSchacht Sep 6, 2024
abffd1d
test global learners
OliverSchacht Sep 6, 2024
7707186
add test for cloned learners
OliverSchacht Sep 6, 2024
5e97ab2
add aggregation for effective observations and final bandwidth
OliverSchacht Sep 7, 2024
8be6d22
add input check for global learners
SvenKlaassen Sep 9, 2024
751a2fb
fix format
SvenKlaassen Sep 9, 2024
59fc64d
fix: _classes missing for fitted classifier
OliverSchacht Sep 9, 2024
53da44d
Merge branch 'main' of https://github.com/DoubleML/doubleml-rdflex
OliverSchacht Sep 9, 2024
b3b8a6e
update global learner test
SvenKlaassen Sep 9, 2024
410c843
Merge branch 'main' of https://github.com/DoubleML/doubleml-rdflex
SvenKlaassen Sep 9, 2024
f83042a
Update test_global_learners.py
SvenKlaassen Sep 9, 2024
de06638
change aggregation to scaled coefficient deviations
OliverSchacht Sep 9, 2024
7c5040d
test n_rep=4 to avoid confusion with n_coef=3
OliverSchacht Sep 9, 2024
be62293
feat: included area yield dpg
Blacky-P Sep 18, 2024
ada8a37
fix: spelling
Blacky-P Sep 18, 2024
f553fd2
update doc string
OliverSchacht Sep 18, 2024
64c0141
update area yield docstring
SvenKlaassen Sep 20, 2024
3937c97
update improvement measurement
SvenKlaassen Sep 20, 2024
8b41246
rename output
SvenKlaassen Sep 20, 2024
f3fdfdb
add a
OliverSchacht Oct 8, 2024
9e66a31
add docstring
OliverSchacht Oct 8, 2024
0a5dfdf
add bias-aware bandwidth
OliverSchacht Oct 14, 2024
2722b4d
change default kernel to triangular
SvenKlaassen Oct 16, 2024
074a721
Merge pull request #273 from DoubleML/main
OliverSchacht Oct 23, 2024
52cce9c
include true tau
OliverSchacht Nov 15, 2024
d86ef73
add more info to __str__
OliverSchacht Nov 19, 2024
9890dd3
Merge pull request #275 from DoubleML/main
OliverSchacht Nov 19, 2024
dc5f1f6
Update test_rdd_default_values.py
SvenKlaassen Dec 2, 2024
61cf5cb
add rdd data generators to init
SvenKlaassen Dec 2, 2024
39e209c
init local variables
SvenKlaassen Dec 2, 2024
f01c90e
initialize all private values in init
SvenKlaassen Dec 2, 2024
dbc502e
add tau to docstring
OliverSchacht Dec 2, 2024
6fff8c5
remove area yield from PR
OliverSchacht Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 20 additions & 20 deletions doubleml/double_ml_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ class DoubleMLData(DoubleMLBaseData):
Default is ``None``.

s_col : None or str
The selection variable (only relevant/used for SSM Estimatiors).
The score or selection variable (only relevant/used for RDD or SSM Estimatiors).
Default is ``None``.

use_other_treat_as_covariate : bool
Expand Down Expand Up @@ -182,7 +182,7 @@ def _data_summary_str(self):
if self.t_col is not None:
data_summary += f'Time variable: {self.t_col}\n'
if self.s_col is not None:
data_summary += f'Selection variable: {self.s_col}\n'
data_summary += f'Score/Selection variable: {self.s_col}\n'
data_summary += f'No. Observations: {self.n_obs}\n'
return data_summary

Expand Down Expand Up @@ -212,7 +212,7 @@ def from_arrays(cls, x, y, d, z=None, t=None, s=None, use_other_treat_as_covaria
Default is ``None``.

s : :class:`numpy.ndarray`
Array of the selection variable (only relevant/used for SSM models).
Array of the score or selection variable (only relevant/used for RDD and SSM models).
Default is ``None``.

use_other_treat_as_covariate : bool
Expand Down Expand Up @@ -351,7 +351,7 @@ def t(self):
@property
def s(self):
"""
Array of selection variable.
Array of score or selection variable.
"""
if self.s_col is not None:
return self._s.values
Expand Down Expand Up @@ -538,7 +538,7 @@ def t_col(self, value):
@property
def s_col(self):
"""
The selection variable.
The score or selection variable.
"""
return self._s_col

Expand All @@ -547,10 +547,10 @@ def s_col(self, value):
reset_value = hasattr(self, '_s_col')
if value is not None:
if not isinstance(value, str):
raise TypeError('The selection variable s_col must be of str type (or None). '
raise TypeError('The score or selection variable s_col must be of str type (or None). '
f'{str(value)} of type {str(type(value))} was passed.')
if value not in self.all_variables:
raise ValueError('Invalid selection variable s_col. '
raise ValueError('Invalid score or selection variable s_col. '
f'{value} is no data column.')
self._s_col = value
else:
Expand Down Expand Up @@ -725,24 +725,24 @@ def _check_disjoint_sets_t_s(self):
if self.s_col is not None:
s_col_set = {self.s_col}
if not s_col_set.isdisjoint(x_cols_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and covariate in '
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and covariate in '
'``x_cols``.')
if not s_col_set.isdisjoint(d_cols_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and treatment variable in '
'``d_cols``.')
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and treatment '
'variable in ``d_cols``.')
if not s_col_set.isdisjoint(y_col_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and outcome variable '
'``y_col``.')
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and outcome '
'variable ``y_col``.')
if self.z_cols is not None:
z_cols_set = set(self.z_cols)
if not s_col_set.isdisjoint(z_cols_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and instrumental '
'variable in ``z_cols``.')
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and '
'instrumental variable in ``z_cols``.')
if self.t_col is not None:
t_col_set = {self.t_col}
if not s_col_set.isdisjoint(t_col_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and time variable '
'``t_col``.')
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and time '
'variable ``t_col``.')


class DoubleMLClusterData(DoubleMLData):
Expand Down Expand Up @@ -780,7 +780,7 @@ class DoubleMLClusterData(DoubleMLData):
Default is ``None``.

s_col : None or str
The selection variable (only relevant/used for SSM Estimatiors).
The score or selection variable (only relevant/used for RDD and SSM Estimatiors).
Default is ``None``.

use_other_treat_as_covariate : bool
Expand Down Expand Up @@ -854,7 +854,7 @@ def _data_summary_str(self):
if self.t_col is not None:
data_summary += f'Time variable: {self.t_col}\n'
if self.s_col is not None:
data_summary += f'Selection variable: {self.s_col}\n'
data_summary += f'Score/Selection variable: {self.s_col}\n'

data_summary += f'No. Observations: {self.n_obs}\n'
return data_summary
Expand Down Expand Up @@ -888,7 +888,7 @@ def from_arrays(cls, x, y, d, cluster_vars, z=None, t=None, s=None, use_other_tr
Default is ``None``.

s : :class:`numpy.ndarray`
Array of the selection variable (only relevant/used for SSM models).
Array of the score or selection variable (only relevant/used for RDD or SSM models).
Default is ``None``.

use_other_treat_as_covariate : bool
Expand Down Expand Up @@ -1039,7 +1039,7 @@ def _check_disjoint_sets_cluster_cols(self):
'cluster variable in ``cluster_cols``.')
if self.s_col is not None:
if not s_col_set.isdisjoint(cluster_cols_set):
raise ValueError(f'{str(self.s_col)} cannot be set as selection variable ``s_col`` and '
raise ValueError(f'{str(self.s_col)} cannot be set as score or selection variable ``s_col`` and '
'cluster variable in ``cluster_cols``.')

def _set_cluster_vars(self):
Expand Down
9 changes: 9 additions & 0 deletions doubleml/rdd/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""
The :mod:`doubleml.rdd` module implements double machine learning estimates for regression discontinuity designs.
"""

from .rdd import RDFlex

__all__ = [
"RDFlex",
]
9 changes: 9 additions & 0 deletions doubleml/rdd/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""
The :mod:`doubleml.rdd.datasets` module implements data generating processes for regression discontinuity designs.
"""

from .simple_dgp import make_simple_rdd_data

__all__ = [
"make_simple_rdd_data",
]
103 changes: 103 additions & 0 deletions doubleml/rdd/datasets/simple_dgp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import numpy as np
from numpy.polynomial.polynomial import Polynomial


def make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs):
"""
Generates synthetic data for a regression discontinuity design (RDD) analysis.

.. math::
Y_0 &= g_0 + g_{cov} + \\epsilon_0 \\
Y_1 &= g_1 + g_{cov} + \\epsilon_1 \\
g_0 &= 0.1 \\cdot \\text{score}^2 \\
g_1 &= \tau + 0.1 \\cdot \\text{score}^2 - 0.5 \\cdot \\text{score}^2 \\
g_{cov} &= \\sum_{i=1}^{\text{dim\\_x}} \text{Polynomial}(X_i) \\
\\epsilon_0, \\epsilon_1 &\\sim \\mathcal{N}(0, 0.2^2)

Parameters
----------
n_obs : int
Number of observations to generate. Default is 5000.

p : int
Degree of the polynomial for covariates. Default is 4.

fuzzy : bool
If True, generates data for a fuzzy RDD. Default is True.

binary_outcome : bool
If True, generates binary outcomes. Default is False.

**kwargs : Additional keyword arguments.
cutoff : float
The cutoff value for the score. Default is 0.0.
dim_x : int
The number of independent covariates. Default is 3.
a : float
Factor to control interaction of score and covariates to the outcome equation. Default is 0.0.
tau : float
Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0.

Returns
-------
dict: A dictionary containing the generated data with keys:
'score' (np.ndarray): The running variable.
'X' (np.ndarray): The independent covariates.
'Y0' (np.ndarray): The potential outcomes without treatment.
'Y1' (np.ndarray): The potential outcomes with treatment.
'intended_treatment' (np.ndarray): The intended treatment assignment.
"""

cutoff = kwargs.get('cutoff', 0.0)
dim_x = kwargs.get('dim_x', 3)
a = kwargs.get('a', 0.0)
tau = kwargs.get('tau', 1.0)

score = np.random.normal(size=n_obs)
# independent covariates
X = np.random.uniform(size=(n_obs, dim_x), low=-1, high=1)

# Create polynomials of covariates
if p == 0:
covs = np.zeros((n_obs, 1))
else:
covs = np.column_stack([Polynomial(np.arange(p + 1))(X[:, i]) for i in range(X.shape[1])])
g_cov = np.sum(covs, axis=1)

g0 = 0.1 * score**2
g1 = tau + 0.1 * score**2 - 0.5 * score**2 + a * np.sum(X, axis=1) * score

eps_scale = 0.2
# potential outcomes with independent errors
if not binary_outcome:
Y0 = g0 + g_cov + np.random.normal(size=n_obs, scale=eps_scale)
Y1 = g1 + g_cov + np.random.normal(size=n_obs, scale=eps_scale)
else:
p_Y0 = 1 / (1 + np.exp(-1.0 * (g0 + g_cov)))
p_Y1 = 1 / (1 + np.exp(-1.0 * (g1 + g_cov)))
Y0 = np.random.binomial(n=1, p=p_Y0, size=n_obs)
Y1 = np.random.binomial(n=1, p=p_Y1, size=n_obs)

intended_treatment = (score >= cutoff).astype(int)
if fuzzy:
prob = 0.3 + 0.4 * intended_treatment + 0.01 * score**2 - 0.02 * score**2 * intended_treatment + 0.2 * g_cov
prob = np.clip(prob, 0.0, 1.0)
D = np.random.binomial(n=1, p=prob, size=n_obs)
else:
D = intended_treatment

D = D.astype(int)
Y = Y0 * (1 - D) + Y1 * D

oracle_values = {
'Y0': Y0,
'Y1': Y1,
}
res_dict = {
'score': score,
'Y': Y,
'D': D,
'X': X,
'oracle_values': oracle_values
}
return res_dict
Loading