Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_mdim() or other stars reading functions do not return the same object as saved by write_mdim() #611

Closed
e-kotov opened this issue Feb 2, 2023 · 2 comments

Comments

@e-kotov
Copy link

e-kotov commented Feb 2, 2023

Hello, stars team,

I am trying to use stars in my workflow and looking for a way to save vector data cube stars objects into an interoperable data format. I am trying to save to nc files, but when I read them back with any of stars's own functions, I cannot recreate the same structure. This issue is similar to #599 , however I am using your own Bristol example for the SDS book.

When reading with any stars::read_...() function from an nc file saved with write_mdim(), the resulting object is always different. I do not have the technical capacity to compare the resulting objects, but they are clearly different from the one that is being saved (e.g. offset and delta columns appear after reading the file, and clearly the overall stars object structure is different, as the same plotting code does not work anymore).

Would you say that the best way to store stars vector cubes, for now, is to simply dump them into R's RDS files? This is not interoperable, but at least in this case I would be able to restore exactly the same object that I had created.

library(tidyverse)
library(stars)
library(spDataLarge)


data(bristol_zones)
data(bristol_od)
bristol_tidy <- bristol_od |> 
  select(-all) |> 
  pivot_longer(3:6, names_to = "mode", values_to = "n")
od <- bristol_tidy |> pull("o") |> unique()
nod <- length(od)
mode <- bristol_tidy |> pull("mode") |> unique()
nmode = length(mode)
a = array(0L,  c(nod, nod, nmode), 
          dimnames = list(o = od, d = od, mode = mode))

a[as.matrix(bristol_tidy[c("o", "d", "mode")])] <- 
  bristol_tidy$n

order <- match(od, bristol_zones$geo_code)
zones <- st_geometry(bristol_zones)[order]

d <- st_dimensions(o = zones, d = zones, mode = mode)
odm <- st_as_stars(list(N = a), dimensions = d)
odm
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>    Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> N     0       0      0 4.801591       0 1296
#> dimension(s):
#>      from  to refsys point
#> o       1 102 WGS 84 FALSE
#> d       1 102 WGS 84 FALSE
#> mode    1   4     NA FALSE
#>                                                             values
#> o    MULTIPOLYGON (((-2.510462...,...,MULTIPOLYGON (((-2.55007 ...
#> d    MULTIPOLYGON (((-2.510462...,...,MULTIPOLYGON (((-2.55007 ...
#> mode                                             bicycle,...,train

plot(adrop(odm[,,33]) + 1, logz = TRUE)

write_mdim(odm, filename = "test.nc")
#> Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
#> : GDAL Error 6: SetIndexingVariable() not implemented
#> Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
#> : GDAL Error 6: SetIndexingVariable() not implemented

#> Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
#> : GDAL Error 6: SetIndexingVariable() not implemented

#> Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
#> : GDAL Error 6: SetIndexingVariable() not implemented

odm_mdim <- read_mdim("test.nc")
odm_mdim
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>    Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> N     0       0      0 4.801591       0 1296
#> dimension(s):
#>      from  to offset delta refsys point
#> o       1 102      1     1     NA    NA
#> d       1 102     NA    NA WGS 84 FALSE
#> mode    1   4     NA    NA     NA    NA
#>                                                             values
#> o                                                             NULL
#> d    MULTIPOLYGON (((-2.510462...,...,MULTIPOLYGON (((-2.55007 ...
#> mode                                             bicycle,...,train
plot(adrop(odm_loaded[,,33]) + 1, logz = TRUE)
#> Error in adrop(odm_loaded[, , 33]): object 'odm_loaded' not found


odm_stars <- read_stars("test.nc")
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #2 (o) is not a Longitude/X dimension.
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #1 (d) is not a Latitude/Y dimension.
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #0 (mode) is not a Time or Vertical dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #2 (o) is not a Longitude/X dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #1 (d) is not a Latitude/Y dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #0 (mode) is not a Time or Vertical dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.

#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.
#> Warning in read_stars("test.nc"): ignoring irregular (rectilinear) x and/or y
#> coordinates!
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #2 (o) is not a Longitude/X dimension.
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #1 (d) is not a Latitude/Y dimension.
#> Warning in CPL_get_metadata(file, NA_character_, options): GDAL Message 1:
#> dimension #0 (mode) is not a Time or Vertical dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #2 (o) is not a Longitude/X dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #1 (d) is not a Latitude/Y dimension.
#> Warning in CPL_get_metadata(file, domain_item, options): GDAL Message 1:
#> dimension #0 (mode) is not a Time or Vertical dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.

#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.
#> Warning in read_stars(x, ..., sub = sub, proxy = TRUE, quiet = TRUE): ignoring
#> irregular (rectilinear) x and/or y coordinates!
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #2 (o) is not a Longitude/X
#> dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #1 (d) is not a Latitude/Y
#> dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #0 (mode) is not a Time or
#> Vertical dimension.
#> Warning in parse_netcdf_meta(meta_data, get_names(x)): NAs introduced by
#> coercion
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: The dataset has several variables that
#> could be identified as vector fields, but not all share the same primary
#> dimension. Consequently they will be ignored.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #2 (o) is not a Longitude/X
#> dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #1 (d) is not a Latitude/Y
#> dimension.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Message 1: dimension #0 (mode) is not a Time or
#> Vertical dimension.
#> Warning in parse_netcdf_meta(meta_data, get_names(x)): NAs introduced by
#> coercion
odm_stars
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>          Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> test.nc     0       0      0 4.801591       0 1296
#> dimension(s):
#>      from  to offset delta refsys x/y
#> x       1 102      0     1 WGS 84 [x]
#> y       1 102    102    -1 WGS 84 [y]
#> mode    1   4     NA    NA     NA
plot(adrop(odm_stars[,,33]) + 1, logz = TRUE)
#> Error in st_as_stars.list(setNames(list(x[, , i]), nms[1]), dimensions = d[dxy]): !is.null(dx) is not TRUE

odm_ncdf <- read_ncdf("test.nc")
#> no 'var' specified, using N
#> other available variables:
#>  crs, o, d, mode, node, x, y, node_count, part_node_count, interior_ring, geometry
#> Will return stars object with 41616 cells.
#> Warning in CPL_crs_from_input(x): GDAL Error 1: PROJ: proj_create: Error 1027
#> (Invalid value for an argument): longlat: Invalid value for units
#> Warning in value[[3L]](cond): failed to create crs based on grid mapping
#> and coordinate variable units. Will return NULL crs.
#> Original error: 
#> Error in st_crs.character(base_gm): invalid crs: +proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs +units=
odm_ncdf
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>    Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> N     0       0      0 4.801591       0 1296
#> dimension(s):
#>      from  to offset delta                values x/y
#> o       1 102    0.5     1                  NULL [x]
#> d       1 102    0.5     1                  NULL [y]
#> mode    1   4     NA    NA [4] bicycle,...,train
plot(adrop(odm_ncdf[,,33]) + 1, logz = TRUE)
#> Error in st_as_stars.list(setNames(list(x[, , i]), nms[1]), dimensions = d[dxy]): !is.null(dx) is not TRUE

sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8 
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] spDataLarge_2.0.9 stars_0.6-0       sf_1.0-9          abind_1.4-5      
#>  [5] forcats_0.5.2     stringr_1.5.0     dplyr_1.0.10      purrr_1.0.1      
#>  [9] readr_2.1.3       tidyr_1.3.0       tibble_3.1.8      ggplot2_3.4.0    
#> [13] tidyverse_1.3.2  
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10         lubridate_1.9.1     class_7.3-21       
#>  [4] assertthat_0.2.1    digest_0.6.31       utf8_1.2.2         
#>  [7] R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
#> [10] reprex_2.0.2        evaluate_0.20       e1071_1.7-12       
#> [13] httr_1.4.4          pillar_1.8.1        rlang_1.0.6        
#> [16] googlesheets4_1.0.1 readxl_1.4.1        ncmeta_0.3.5       
#> [19] rstudioapi_0.14     rmarkdown_2.20      googledrive_2.0.0  
#> [22] munsell_0.5.0       proxy_0.4-27        broom_1.0.3        
#> [25] compiler_4.2.1      modelr_0.1.10       xfun_0.36          
#> [28] pkgconfig_2.0.3     htmltools_0.5.4     tidyselect_1.2.0   
#> [31] fansi_1.0.4         crayon_1.5.2        tzdb_0.3.0         
#> [34] dbplyr_2.3.0        withr_2.5.0         grid_4.2.1         
#> [37] lwgeom_0.2-11       jsonlite_1.8.4      gtable_0.3.1       
#> [40] lifecycle_1.0.3     DBI_1.1.3           magrittr_2.0.3     
#> [43] units_0.8-1         scales_1.2.1        KernSmooth_2.23-20 
#> [46] cli_3.6.0           stringi_1.7.12      fs_1.6.0           
#> [49] xml2_1.3.3          ellipsis_0.3.2      generics_0.1.3     
#> [52] vctrs_0.5.2         tools_4.2.1         glue_1.6.2         
#> [55] RNetCDF_2.6-2       hms_1.1.2           parallel_4.2.1     
#> [58] fastmap_1.1.0       yaml_2.3.7          timechange_0.2.0   
#> [61] colorspace_2.1-0    gargle_1.2.1        classInt_0.4-8     
#> [64] rvest_1.0.3         knitr_1.42          haven_2.5.1

Created on 2023-02-02 with reprex v2.0.2

@edzer
Copy link
Member

edzer commented Feb 2, 2023

Great question. The data cube created there has two dimensions associated with vector geometries (origin, destination). NetCDF data cubes with geometries can have maximally one dimension associated with vector data cubes when following the CF conventions. Of course you can do anything with NetCDF files when not following these conventions, but then you again break interoperability.

@e-kotov
Copy link
Author

e-kotov commented Feb 2, 2023

Thank you for the quick response! So RDS it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants