-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
OK, I actually have recreated a similar issue. In the following code, I create an sf object and write it as a dataset to parquet files. I then call open_dataset() on the files.
If I collect() the dataset I get back an sf object, no problem.
But if I first filter() the dataset then collect() I get an error.
library(sf)
library(arrow)
library(dplyr)
n <- 10000
fake <- tibble(
ID=seq(n),
Date=sample(seq(as.Date('2019-01-01'), as.Date('2021-04-01'), by=1), size=n, replace=TRUE),
x=runif(n=n, min=-170, max=170),
y=runif(n=n, min=-60, max=70),
text1=sample(x=state.name, size=n, replace=TRUE),
text2=sample(x=state.name, size=n, replace=TRUE),
text3=sample(x=state.division, size=n, replace=TRUE),
text4=sample(x=state.region, size=n, replace=TRUE),
text5=sample(x=state.abb, size=n, replace=TRUE),
num1=sample(x=state.center$x, size=n, replace=TRUE),
num2=sample(x=state.center$y, size=n, replace=TRUE),
num3=sample(x=state.area, size=n, replace=TRUE),
Rand1=rnorm(n=n),
Rand2=rnorm(n=n, mean=100, sd=3),
Rand3=rbinom(n=n, size=10, prob=0.4)
)
# make it into an sf object
spat <- fake %>%
st_as_sf(coords=c('x', 'y'), remove=FALSE, crs = 4326)
class(spat)
class(spat$geometry)
# create new columns for partitioning and write to disk
spat %>%
mutate(Year=lubridate::year(Date), Month=lubridate::month(Date)) %>%
group_by(Year, Month) %>%
write_dataset('data/splits/', format='parquet')
spat_in <- open_dataset('data/splits/')
class(spat_in)
# it's an sf as expected
spat_in %>% collect() %>% class()
spat_in %>% collect() %>% pull(geometry) %>% class()
# it even plots
leaflet::leaflet() %>%
leaflet::addTiles() %>%
leafgl::addGlPoints(data=spat_in %>% collect())
# but if we filter first
spat_in %>%
filter(Year == 2020 & Month == 2) %>%
collect()
# we get this error
Error in st_geometry.sf(x) :
attr(obj, "sf_column") does not point to a geometry column.
Did you rename it, without setting st_geometry(obj) <- "newname"?
In addition: Warning message:
Invalid metadata$r Reporter: Jonathan Keane / @jonkeane
Assignee: Jonathan Keane / @jonkeane
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-12542. Please see the migration documentation for further details.