-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using targets with geoarrow? #1
Comments
VectorUsing {geoarrow} with {targets} should be pretty painless now with geoarrow/geoarrow-r latest. You'll just need to ensure {geoarrow} is attached prior to reading / writing. Minimal reprex with any old vector geometry data: library(targets)
tar_script({
requireNamespace("geoarrow")
list(tar_target(sa4s, as.data.frame(strayr::read_absmap("sa42021")),
format = "parquet"))
})
requireNamespace("geoarrow")
#> Loading required namespace: geoarrow
tar_make()
#> Loading required namespace: geoarrow
#> > dispatched target sa4s
#> trying URL 'https://github.com/wfmackey/absmapsdata/blob/master/data/absmapsdata_file_list.rda?raw=true'
#> Content type 'application/octet-stream' length 407 bytes
#> ==================================================
#> downloaded 407 bytes
#>
#> trying URL 'https://github.com/wfmackey/absmapsdata/raw/master/data/sa42021.rda'
#> Content type 'application/octet-stream' length 3044178 bytes (2.9 MB)
#> ==================================================
#> downloaded 2.9 MB
#>
#> o completed target sa4s [3.2 seconds]
#> > ended pipeline [4.02 seconds]
str(tar_read(sa4s))
#> tibble [108 × 10] (S3: tbl_df/tbl/data.frame)
#> $ sa4_code_2021 : chr [1:108] "101" "102" "103" "104" ...
#> $ sa4_name_2021 : chr [1:108] "Capital Region" "Central Coast" "Central West" "Coffs Harbour - Grafton" ...
#> $ gcc_code_2021 : chr [1:108] "1RNSW" "1GSYD" "1RNSW" "1RNSW" ...
#> $ gcc_name_2021 : chr [1:108] "Rest of NSW" "Greater Sydney" "Rest of NSW" "Rest of NSW" ...
#> $ state_code_2021: chr [1:108] "1" "1" "1" "1" ...
#> $ state_name_2021: chr [1:108] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
#> $ areasqkm_2021 : num [1:108] 51896 1681 70297 13230 339356 ...
#> $ cent_lat : num [1:108] -35.6 -33.3 -33.2 -29.8 -31 ...
#> $ cent_long : num [1:108] 149 151 148 153 145 ...
#> $ geometry : geoarrow_vctr[1:108] <MULTIPOLYGON (((150.311 -35.666, 150.313 -35.668, 15
#> - attr(*, "sf_column")= chr "geometry"
#> - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
#> ..- attr(*, "names")= chr [1:9] "sa4_code_2021" "sa4_name_2021" "gcc_code_2021" "gcc_name_2021" ... Created on 2024-02-21 with reprex v2.0.2 RasterI don't use rasters much, so I don't have much input there. |
Awesome, thanks so much @anthonynorth ! That looks like it will work great for me. Cheers! |
Oh looks like there's no friendly to / from geoarrow to sfc or other formats yet. It probably makes sense to wrap the Something like this (untested), using wkb as the output format type. I didn't check if a geoarrow/geoparquet tar format exists! library(targets)
tar_script({
requireNamespace("geoarrow")
tar_format_geoparquet = tar_format(
read = function(path) {
arrow::read_parquet(path) |>
dplyr::mutate(dplyr::across(wk::is_handleable, wk::as_wkb))
},
write = function(object, path) {
arrow::write_parquet(object, path)
},
marshal = function(object) {
arrow::as_arrow_table(object)
},
unmarshal = function(object) {
as.data.frame(object) |>
dplyr::mutate(dplyr::across(wk::is_handleable, wk::as_wkb))
}
)
list(tar_target(sa4s, as.data.frame(strayr::read_absmap("sa42021")),
format = tar_format_geoparquet))
})
tar_make()
#> Loading required namespace: geoarrow
#> > dispatched target sa4s
#> trying URL 'https://github.com/wfmackey/absmapsdata/blob/master/data/absmapsdata_file_list.rda?raw=true'
#> Content type 'application/octet-stream' length 407 bytes
#> ==================================================
#> downloaded 407 bytes
#>
#> trying URL 'https://github.com/wfmackey/absmapsdata/raw/master/data/sa42021.rda'
#> Content type 'application/octet-stream' length 3044178 bytes (2.9 MB)
#> ==================================================
#> downloaded 2.9 MB
#>
#> o completed target sa4s [2.42 seconds]
#> > ended pipeline [3.41 seconds]
requireNamespace("geoarrow")
#> Loading required namespace: geoarrow
str(tar_read(sa4s))
#> tibble [108 × 10] (S3: tbl_df/tbl/data.frame)
#> $ sa4_code_2021 : chr [1:108] "101" "102" "103" "104" ...
#> $ sa4_name_2021 : chr [1:108] "Capital Region" "Central Coast" "Central West" "Coffs Harbour - Grafton" ...
#> $ gcc_code_2021 : chr [1:108] "1RNSW" "1GSYD" "1RNSW" "1RNSW" ...
#> $ gcc_name_2021 : chr [1:108] "Rest of NSW" "Greater Sydney" "Rest of NSW" "Rest of NSW" ...
#> $ state_code_2021: chr [1:108] "1" "1" "1" "1" ...
#> $ state_name_2021: chr [1:108] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
#> $ areasqkm_2021 : num [1:108] 51896 1681 70297 13230 339356 ...
#> $ cent_lat : num [1:108] -35.6 -33.3 -33.2 -29.8 -31 ...
#> $ cent_long : num [1:108] 149 151 148 153 145 ...
#> $ geometry : wk_wkb[1:108] <MULTIPOLYGON (((150.3113 -35.66587, 150.3126 -35.66813, 150
#> - attr(*, "sf_column")= chr "geometry"
#> - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
#> ..- attr(*, "names")= chr [1:9] "sa4_code_2021" "sa4_name_2021" "gcc_code_2021" "gcc_name_2021" ... Created on 2024-02-21 with reprex v2.0.2 |
Thanks again for your help with this @anthonynorth ! I'm currently working on a geotargets extension package: https://github.com/njtierney/geotargets which should hopefully implement these features. I've linked this issue - looking forward to working on this |
Heya @anthonynorth, @MilesMcBain mentioned that you would use geoarrow with targets to help store the data, I was just wondering if you have any pointers/thoughts on storing rasters and shapefiles in targets?
The text was updated successfully, but these errors were encountered: