Skip to content

R code to fit the models and reproduce the results described in Urdangarin et al. (2023, Rev Mat Complut)

Notifications You must be signed in to change notification settings

spatialstatisticsupna/REMC_confounding_article

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating recent methods to overcome spatial confounding

This repository contains the R code to implement the methods described in the paper entitled "Evaluating recent methods to overcome spatial confounding" (Urdangarin et al., 2023) as well as the R code to create the figures and tables presented in the paper.

Table of contents

Data

Dowry deaths data in Uttar Pradesh in 2001 (Vicente et al., 2020)

The Dowry_death_2001.Rdata file contains the following objects:

  • Data: contains the data set used. It is a data.frame object with the following variables:

    • dist: names of the districts of Uttar Pradesh

    • ID.area: numeric identifiers of districts

    • O: number of dowry deaths in each district in 2001

    • E: number of expected cases of each district in 2001

    • X1: standardized sex ratio covariate (number of females per 1000 males)

  • carto: SpatialPolygonDataFrame object with the cartography of the 70 districts (year 2001) of Uttar Pradesh

  • Q.xi: spatial adjacency matrix

Stomach cancer incidence data in Slovenia during the period 1995-2001 (Zadnik and Reich, 2006)

The Slovenia_stomach_cancer file contains the following objects:

  • Data: contains the data set used. It is a data.frame object with the following variables:

    • ID.area: numeric district identifiers

    • O: number of stomach cancer cases in each area during 1995-2001

    • E: number of expected cases in each area during 1995-2001

    • X: standardized socioeconomic indicator

  • coord: a matrix that contains the coordinates of the 192 areas of Slovenia

  • Q.xi: spatial adjacency matrix

Slovenia data set is available from the package RASCO of R https://github.com/DouglasMesquita/RASCO. This dataset is also available from the web page of James Hogdes.

Lip cancer incidence data in Scotland during 1975-1980 (Breslow and Clayton, 1993)

The Scotland_lip_cancer.Rdata file contains the following objects:

  • Data: contains the data set used. It is a dataframe with the following variables:

    • ID.area: numeric district identifiers

    • O: number of lip cancer cases in each area during 1975-1980

    • E: number of expected cases in each area during 1975-1980

    • AFF: standardized covariate indicating the proportion of the population engaged in agriculture, fishing, or forestry

  • carto: SpatialPolygonDataFrame object with the cartography of the 56 districts of Scotland

  • Q.xi: spatial adjacency matrix

Simulated data

Simulated_data folder contains a total of 18 .Rdata files (one file for each scenario and subscenario) used in Simulation Study 1 and Simulation Study 2. Each .Rdata file contains the same objects as Dowry_death_2001.Rdata (Data, carto, Q.xi) but a simulated covariate X2 is added to Data. Moreover, each .Rdata contains the following objects as well:

  • log.risk: a vector that contains the simulated log risks
  • simu.O: a list with 100 simulated counts data sets

The R code to simulate the data is available in SimuStudy1_simulate_data.R and SimuStudy2_simulate_data.R.

R code

R code to implement the procedures to alleviate spatial confounding described in the paper and to reproduce the tables and figures of the paper has been included.

  • R/Real_data_analysis folder contains the R code used in the real data analysis.

    The main file to fit the null, spatial, RSR and spatial+ models is Null_Spatial_RSR_SpatPlus_models.R. Before running the models, the dataset argument (one of either "Dowry", "Slovenia" or "Scotland") must be defined at the top of the code.

    • Figure1.R: R script to reproduce Figure 1 of the paper.
    • Covariate_model_eigenvectors.R: R script to fit the covariate model based on the eigenvectors of the spatial precision matrix to remove the spatial dependence from the covariate before fitting the spatial+ model.
    • Covariate_model_Psplines.R: R script to fit the covariate model based on P-splines to remove the spatial dependence from the covariate before fitting the spatial+ model.
    • Covariate_model_TPsplines.R: R script to fit the covariate model based on thin plate splines to remove the spatial dependence from the covariate before fitting the spatial+ model.
  • R/Simulation_Study_1 folder contains the R code used in Simulation Study 1.

    Before running the models, the arguments Scenario(1, 2 or 3) and Subscenario (cor=80, 50 or 20) must be defined at the top of the code.

  • R/Simulation_Study_2 folder contains the R code used in Simulation Study 2.

    Before running the models, the arguments Scenario(1, 2 or 3) and Subscenario (cor=80, 50 or 20) must be defined at the top of the code.

Computations were run using R-4.0.4, INLA version 21.02.23, mgcv version 1.8-40.

Acknowledgements

This work has been supported by Project PID2020-113125RB-I00/ MCIN/ AEI/ 10.13039/501100011033.

image

References

Urdangarin, A., Goicoa, T. and Ugarte, M.D. (2023). Evaluating recent methods to overcome spatial confounding. Revista Matemática Complutense 36, 333-360. DOI: 10.1007/s13163-022-00449-8.

About

R code to fit the models and reproduce the results described in Urdangarin et al. (2023, Rev Mat Complut)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages