remoteoutcome: Program Evaluation with Remotely Sensed Variables

R package for estimating treatment effects using remotely sensed variables (RSVs) such as satellite images or mobile phone data.

Overview

This package implements the nonparametric methods developed in:

Rambachan, A., Singh, R., and Viviano, D. (2025). "Program Evaluation with Remotely Sensed Outcomes." arXiv:2411.10959

Installation from source

The package may be installed by using the function install_github() from the devtools package:

install.packages("ranger")
install.packages("devtools")

install.packages("latex2exp")

devtools::install_github(
  "https://github.com/asheshrambachan/remoteoutcome", 
  build_vignettes = TRUE
)

Vignettes

The package includes two comprehensive vignettes:

1. Treatment Effect Estimation with Remote Sensing Variables

The package includes a vignette illustrating how the remoteoutcome package can be used to estimate treatment effects when outcomes are measured using remotely sensed variables (RSVs). In particular, it illustrates how remoteoutcome can be used to re-analyze the Smartcards experiment (Muralidharan et al. 2016; Muralidharan et al. 2023), following the same design reported in Section 5 of Rambachan, Singh, and Viviano (2025).

vignette("treatment-effects", package = "remoteoutcome")

2. Constructing Remote Sensed Variables

This vignette provides a step-by-step guide for constructing remote sensed variables by combining multiple data sources. This vignette is essential for users who need to generate their own remote sensing features.

vignette("construct-remote-vars", package = "remoteoutcome")

Sample Splitting Options in `remoteoutcome`

The estimator implemented in remoteoutcome relies on sample splitting. remoteoutcome provides the user with three possible options: (i) cross-fitting, (ii) sample-splitting, and (iii) no sample splitting.

1. Cross-Fitting (Recommended)

K-fold cross-fitting splits data into K folds, fits predictions using the remotely sensed variable on K-1 folds, and predicts on the held-out fold. It then iterates across folds.

library(dplyr)
library(remoteoutcome)

# Load the data
data("smartcard_data_p1", package="remoteoutcome")
data("smartcard_data_p2", package="remoteoutcome")

# Merge remote variables
smartcard_data <- inner_join(smartcard_data_p1, smartcard_data_p2, by="shrid2") %>%
rm(smartcard_data_p1, smartcard_data_p2)

data_real <- create_data_real(smartcard_data)

Y <- data_real$Ycons # binary outcome
D <- data_real$D # binary treatment
R <- data_real %>% select(starts_with("luminosity"), starts_with("satellite")) # remotely sensed variable
S_e <- !is.na(D) & (rowSums(is.na(R)) == 0) # experimental sample indicator (Observe D, R)
S_o <- !is.na(Y) & (rowSums(is.na(R)) == 0) #  observational sample indicator (Observe Y, R)
clusters <- data_real$clusters # Subdistrict-level cluster identifiers

result <- rsv_estimate(
  Y = Y, D = D, S_e = S_e, S_o = S_o, R = R,
  method = "crossfit",
  ml_params = list(nfolds = 5, seed = 42),
  se_params = list(fix_seed = TRUE, clusters = clusters), 
  cores = 7
)

print(result)
#> RSV Treatment Effect Estimate
#> ==============================
#> 
#> Coefficient: -0.1082
#> 
#> Method: crossfit

2. Sample Splitting

Sample splitting splits the data into a train and test set. The train set is used to fit predictions using the remotely sensed variable, and the test set is used to construct the estimator.

result <- rsv_estimate(
  Y = Y, D = D, S_e = S_e, S_o = S_o, R = R,
  method = "split",
  ml_params = list(train_ratio = 0.5, seed = 42),
  se_params = list(fix_seed = TRUE, clusters = clusters), 
  cores = 7
)

print(result)
#> RSV Treatment Effect Estimate
#> ==============================
#> 
#> Coefficient: -0.1086 (SE: 0.0959)
#> 
#> Sample sizes:
#>   Experimental: 3032
#>   Observational: 2575
#>   Both: 1451
#> 
#> Method: split

3. No Splitting

No sample splitting uses all of the data for both training predictions based on the remotely sensed variable and constructing the estimator. This is generally not recommended due to possible overfitting, but valid estimation/inference can still be conducted without sample splitting for particularly simple machine learning procedures. See Rambachan, Singh and Viviano (2025) for more discussion.

result <- rsv_estimate(
  Y = Y,
  D = D,
  S_e = S_e,
  S_o = S_o,
  R = R,
  eps = 1e-2,
  method = "none",
  ml_params = list(       # Customize random forest parameters:
    ntree = 100,          #   Number of trees
    classwt_Y = c(10, 1), #   Class weights for PRED_Y model
    seed = 42             #   A random seed for each RF for reproducibility
  ),
  se = TRUE,
  se_params = list(       # Customize cluster-bootstrap standard errors:
    B = 1000,             #   Number of bootstrap replications
    clusters = clusters,  #   Cluster identifiers for clustered sampling, if not provided, use individual-level bootstrap
    fix_seed = TRUE       #   Enables deterministic seeding for reproducibility 
    ),
  cores = 7
)

print(result)
#> RSV Treatment Effect Estimate
#> ==============================
#> 
#> Coefficient: -0.0135 (SE: 0.0120)
#> 
#> Sample sizes:
#>   Experimental: 6055
#>   Observational: 5186
#>   Both: 2929
#> 
#> Method: none

# 90% confidence interval
confint(result, level = 0.90)
#>         5.0 %      95.0 %
#> D -0.03319573 0.006176239

User-Provided Predictions in `remoteoutcome`

remoteoutcome allows the user to pass their own fitted predictions using the remotely sensed variable. This can be useful if the user would like to train predictions using the remotely sensed variable using more complex machine learning methods that are not directly implemented by remoteoutcome.

If you have your own fitted predictions, provide them directly:

# Fit your own models to obtain predictions. 
# Load sample data
data("pred_real_Ycons", package = "remoteoutcome")
force(pred_real_Ycons)

result <- rsv_estimate(
  Y = pred_real_Ycons$Y,
  D = pred_real_Ycons$D,
  S_e = pred_real_Ycons$S_e,
  S_o = pred_real_Ycons$S_o,
  pred_Y = pred_real_Ycons$pred_Y,
  pred_D = pred_real_Ycons$pred_D,
  pred_S_e = pred_real_Ycons$pred_S_e,
  pred_S_o = pred_real_Ycons$pred_S_o,
  theta_init = attr(pred_real_Ycons, "theta_init"), # -0.03220447
  method = "predictions",
  se = TRUE,
  se_params = list(B = 1000, fix_seed = TRUE, clusters = pred_real_Ycons$clusters),
  cores = 7
)

print(result)
#> RSV Treatment Effect Estimate
#> ==============================
#> 
#> Coefficient: -0.0135 (SE: 0.0120)
#> 
#> Sample sizes:
#>   Experimental: 6055
#>   Observational: 5186
#>   Both: 2929
#> 
#> Method: predictions

Planned Features for `remoteoutcome`

The current version (0.1.0) implements the core estimator proposed in Rambachan, Singh and Viviano (2025) for binary outcomes without pre-treatment covariates assuming there are no direct effects. This corresponds to Algorithm 1 in Rambachan, Singh and Viviano (2025); see the paper for further details. The following extensions are under active development:

Incorporating pre-treatment covariates and discrete outcomes as discussed in Appendix E of Rambachan, Singh and Viviano (2025);
Allowing for direct effects in the complete case (see the paper for further details); and
Allowing for continuous outcomes through discretization.

Contributions and feedback are welcome via GitHub Issues.

Citation

If you use this package, please cite:

@article{rambachan2025program,
  title={Program Evaluation with Remotely Sensed Outcomes},
  author={Rambachan, Ashesh and Singh, Rahul and Viviano, Davide},
  journal={arXiv preprint arXiv:2411.10959},
  year={2025}
}

License

MIT License - see LICENSE file for details.

Issues and Contributions

Please report issues or suggest improvements at the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
R		R
data-raw		data-raw
data		data
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
build_package.R		build_package.R
remoteoutcome_1.1.0.pdf		remoteoutcome_1.1.0.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

remoteoutcome: Program Evaluation with Remotely Sensed Variables

Overview

Installation from source

Vignettes

1. Treatment Effect Estimation with Remote Sensing Variables

2. Constructing Remote Sensed Variables

Sample Splitting Options in `remoteoutcome`

1. Cross-Fitting (Recommended)

2. Sample Splitting

3. No Splitting

User-Provided Predictions in `remoteoutcome`

Planned Features for `remoteoutcome`

Citation

License

Issues and Contributions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

asheshrambachan/remoteoutcome

Folders and files

Latest commit

History

Repository files navigation

remoteoutcome: Program Evaluation with Remotely Sensed Variables

Overview

Installation from source

Vignettes

1. Treatment Effect Estimation with Remote Sensing Variables

2. Constructing Remote Sensed Variables

Sample Splitting Options in remoteoutcome

1. Cross-Fitting (Recommended)

2. Sample Splitting

3. No Splitting

User-Provided Predictions in remoteoutcome

Planned Features for remoteoutcome

Citation

License

Issues and Contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Sample Splitting Options in `remoteoutcome`

User-Provided Predictions in `remoteoutcome`

Planned Features for `remoteoutcome`

Packages