Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh attributes of sfc column when subsetting from data.table #4217

Open
jplecavalier opened this issue Jan 30, 2020 · 5 comments
Open

Refresh attributes of sfc column when subsetting from data.table #4217

jplecavalier opened this issue Jan 30, 2020 · 5 comments
Labels
non-atomic column e.g. list columns, S4 vector columns

Comments

@jplecavalier
Copy link

I like to use the sf package and data.table together. Instead of using the sf class, which is built on top of the data.frame class, I use the data.table class with a column of class sfc which contains a vector of simple features with some metadata kept in the object's attributes.

However, when I subset some rows of my data.table, those attributes are not automatically updated, which can cause some problems. When subsetting from an sf, those attributes are automatically updated.

I have a non-elegant way to get around the problem by manually updating the attributes of the sfc column, but I'm wondering if there is a way to make it under the hood.

Let's say I have the following

library(sf)
library(data.table)

set.seed(20200130)

data <- data.table(
  id = 1:5,
  point = st_sfc(replicate(5, st_point(c(runif(1), runif(1))), simplify = FALSE))
)

data[]
##     id                        point
##  1:  1 POINT (0.09302893 0.6560987)
##  2:  2  POINT (0.4387638 0.7161379)
##  3:  3 POINT (0.8535522 0.08598417)
##  4:  4   POINT (0.927848 0.3534847)
##  5:  5 POINT (0.9615244 0.07300738)

Let's look at the attributes of the point column.

attributes(data$point)
##  $class
##  [1] "sfc_POINT" "sfc"      
##  
##  $precision
##  [1] 0
##  
##  $bbox
##        xmin       ymin       xmax       ymax 
##  0.09302893 0.07300738 0.96152443 0.71613790 
##  
##  $crs
##  Coordinate Reference System: NA
##  
##  $n_empty
##  [1] 0

If I subset some elements, the attributes won't update.

subset <- data[1:2]
attributes(subset$point)
##  $class
##  [1] "sfc_POINT" "sfc"      
##  
##  $precision
##  [1] 0
##  
##  $bbox
##        xmin       ymin       xmax       ymax 
##  0.09302893 0.07300738 0.96152443 0.71613790 
##  
##  $crs
##  Coordinate Reference System: NA
##  
##  $n_empty
##  [1] 0

However, if I use an sf object, they will silently update (you can see it easily with the bbox attribute).

set.seed(20200130)

data <- st_sf(
  id = 1:5,
  point = st_sfc(replicate(5, st_point(c(runif(1), runif(1))), simplify = FALSE))
)

subset <- data[1:2, ]

attributes(subset$point)
##  $class
##  [1] "sfc_POINT" "sfc"      
##  
##  $precision
##  [1] 0
##  
##  $bbox
##        xmin       ymin       xmax       ymax 
##  0.09302893 0.65609871 0.43876379 0.71613790 
##  
##  $crs
##  Coordinate Reference System: NA
##  
##  $n_empty
##  [1] 0

I don't understand everything behind how data.table works, but I'm wondering if there is a way you could update the attributes of any sfc column present in a data.table. An easy way of doing so would be to wrap a generalization of something like this.

set.seed(20200130)

data <- data.table(
  id = 1:5,
  point = st_sfc(replicate(5, st_point(c(runif(1), runif(1))), simplify = FALSE))
)

subset <- data[1:2]
subset$point <- st_sfc(lapply(subset$point, identity))

attributes(subset$point)
##  $class
##  [1] "sfc_POINT" "sfc"      
##  
##  $precision
##  [1] 0
##  
##  $bbox
##        xmin       ymin       xmax       ymax 
##  0.09302893 0.65609871 0.43876379 0.71613790 
##  
##  $crs
##  Coordinate Reference System: NA
##  
##  $n_empty
##  [1] 0

We would then achieve the same result than when using an sf object.

Like I said, I'm not familiar with the development of data.table, I'm really just a user who don't like to use any other data structure than data.table. You will excuse me if I'm missing something on why it is impossible to implement such feature.


sessionInfo()
##  R version 3.6.1 (2019-07-05)
##  Platform: x86_64-suse-linux-gnu (64-bit)
##  Running under: SUSE Linux Enterprise Server 12 SP1
##  
##  Matrix products: default
##  BLAS:   /usr/lib64/R/lib/libRblas.so
##  LAPACK: /usr/lib64/R/lib/libRlapack.so
##  
##  locale:
##   [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8          
##   [4] LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##   [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8       
##  [10] LC_TELEPHONE=en_US.UTF-8      LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
##  
##  attached base packages:
##  [1] stats     graphics  grDevices utils     datasets  methods   base     
##  
##  other attached packages:
##  [1] leaflet_2.0.3       ggspatial_1.0.3     OpenStreetMap_0.3.4 ggplot2_3.2.1      
##  [5] data.table_1.12.6   sf_0.8-0            magrittr_1.5        dplyr_0.8.3        
##  [9] DBI_1.0.0          
##  
##  loaded via a namespace (and not attached):
##   [1] tidyselect_0.2.5        purrr_0.3.3             rJava_0.9-11           
##   [4] lattice_0.20-38         leaflet.providers_1.9.0 colorspace_1.4-1       
##   [7] vctrs_0.2.0             htmltools_0.4.0         segter_0.0.0.9000      
##  [10] yaml_2.2.0              blob_1.2.0              rlang_0.4.2            
##  [13] e1071_1.7-3             pillar_1.4.2            later_1.0.0            
##  [16] glue_1.3.1              withr_2.1.2             sp_1.3-2               
##  [19] bit64_0.9-7             dbplyr_1.4.2            lifecycle_0.1.0        
##  [22] munsell_0.5.0           gtable_0.3.0            raster_3.0-7           
##  [25] htmlwidgets_1.5.1       codetools_0.2-16        labeling_0.3           
##  [28] fastmap_1.0.1           httpuv_1.5.2            crosstalk_1.0.0        
##  [31] class_7.3-15            Rcpp_1.0.3              xtable_1.8-4           
##  [34] KernSmooth_2.23-16      scales_1.1.0            backports_1.1.5        
##  [37] promises_1.1.0          classInt_0.4-2          jsonlite_1.6           
##  [40] mime_0.7                farver_2.0.1            bit_1.1-14             
##  [43] hms_0.5.2               digest_0.6.23           shiny_1.4.0            
##  [46] grid_3.6.1              rgdal_1.4-8             odbc_1.2.1             
##  [49] tools_3.6.1             lazyeval_0.2.2          tibble_2.1.3           
##  [52] crayon_1.3.4            pkgconfig_2.0.3         zeallot_0.1.0          
##  [55] assertthat_0.2.1        rstudioapi_0.10         R6_2.4.1               
##  [58] units_0.6-5             compiler_3.6.1         
@WerthPADOH
Copy link

I'm having a similar problem with a custom class that extends "Date" vectors. My class has its own methods for [ and [[, which are used by data.frame and tibble for subsetting, but not data.table.

@jangorecki

This comment has been minimized.

@WerthPADOH
Copy link

library(data.table)

abc_index <- function(x) {
  x <- as.character(x)
  first <- tolower(substr(x, 1, 1))
  index <- match(first, letters)
  attr(index, "original") <- first
  class(index) <- c("abc_index", "integer")
  index
}

`[.abc_index` <- function(x, i) {
  abc_index(attr(x, "original")[i])
}

DF <- data.frame(a = abc_index(c("p", "k", "?")))
DT <- data.table(a = abc_index(c("p", "k", "?")))
all.equal(DF[, "a"], DT[, a])
# [1] TRUE
attr(DF[2:3, "a"], "original")
# [1] "k" "?"
attr(DT[2:3, a], "original")
# [1] "p" "k" "?"

data.frame uses the class' subsetting method, but data.table doesn't.

@kufreu
Copy link

kufreu commented Apr 17, 2021

This may be a little late and not necessarily useful, but I wrote a short function that uses your helpful workaround @jplecavalier.

library(data.table)
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

set.seed(20200130)

data <- data.table(
  id = 1:5,
  point = st_sfc(replicate(5, st_point(c(runif(1), runif(1))), simplify = FALSE))
)

subset <- data[1:2]

st_identity <- function(x) {
  idx <- x[, which(unlist(lapply(.SD, function(x) "sfc" %in% class(x))))]

  if (length(idx) == 0) stop("No geometry column is present.", call. = F)

  crs <- lapply(idx, function(y) st_crs(x[, y, with = F][[1]]))

  names(crs) <- as.character(idx)

  for (i in idx) x[, (i) := st_sfc(lapply(x[, i, with = F][[1]], identity), crs = crs[[as.character(i)]])]
}

attributes(data$point)
#> $precision
#> [1] 0
#> 
#> $bbox
#>       xmin       ymin       xmax       ymax 
#> 0.09302893 0.07300738 0.96152443 0.71613790 
#> 
#> $crs
#> Coordinate Reference System: NA
#> 
#> $n_empty
#> [1] 0
#> 
#> $class
#> [1] "sfc_POINT" "sfc"

st_identity(subset)

attributes(subset$point)
#> $class
#> [1] "sfc_POINT" "sfc"      
#> 
#> $precision
#> [1] 0
#> 
#> $bbox
#>       xmin       ymin       xmax       ymax 
#> 0.09302893 0.65609871 0.43876379 0.71613790 
#> 
#> $crs
#> Coordinate Reference System: NA
#> 
#> $n_empty
#> [1] 0

Created on 2021-04-17 by the reprex package (v2.0.0)

@magerton
Copy link

Just filed r-spatial/sf#1660, which I think is the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

No branches or pull requests

5 participants