Skip to content

[R] SF columns in datasets with filters #28305

@asfimport

Description

@asfimport

First reported at https://issues.apache.org/jira/browse/ARROW-10386?focusedCommentId=17331668&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17331668

OK, I actually have recreated a similar issue. In the following code, I create an sf object and write it as a dataset to parquet files. I then call open_dataset() on the files.

If I collect() the dataset I get back an sf object, no problem.

But if I first filter() the dataset then collect() I get an error.

library(sf)
library(arrow)
library(dplyr)

n <- 10000

fake <- tibble(
    ID=seq(n),
    Date=sample(seq(as.Date('2019-01-01'), as.Date('2021-04-01'), by=1), size=n, replace=TRUE),
    x=runif(n=n, min=-170, max=170),
    y=runif(n=n, min=-60, max=70),
    text1=sample(x=state.name, size=n, replace=TRUE),
    text2=sample(x=state.name, size=n, replace=TRUE),
    text3=sample(x=state.division, size=n, replace=TRUE),
    text4=sample(x=state.region, size=n, replace=TRUE),
    text5=sample(x=state.abb, size=n, replace=TRUE),
    num1=sample(x=state.center$x, size=n, replace=TRUE),
    num2=sample(x=state.center$y, size=n, replace=TRUE),
    num3=sample(x=state.area, size=n, replace=TRUE),
    Rand1=rnorm(n=n),
    Rand2=rnorm(n=n, mean=100, sd=3),
    Rand3=rbinom(n=n, size=10, prob=0.4)
)

# make it into an sf object
spat <- fake %>% 
    st_as_sf(coords=c('x', 'y'), remove=FALSE, crs = 4326)

class(spat)
class(spat$geometry)

# create new columns for partitioning and write to disk
spat %>% 
    mutate(Year=lubridate::year(Date), Month=lubridate::month(Date)) %>% 
    group_by(Year, Month) %>% 
    write_dataset('data/splits/', format='parquet')

spat_in <- open_dataset('data/splits/')

class(spat_in)

# it's an sf as expected
spat_in %>% collect() %>% class()
spat_in %>% collect() %>% pull(geometry) %>% class()

# it even plots
leaflet::leaflet() %>% 
    leaflet::addTiles() %>% 
    leafgl::addGlPoints(data=spat_in %>% collect())

# but if we filter first
spat_in %>% 
    filter(Year == 2020 & Month == 2) %>% 
    collect()

# we get this error
Error in st_geometry.sf(x) : 
  attr(obj, "sf_column") does not point to a geometry column.
Did you rename it, without setting st_geometry(obj) <- "newname"?
In addition: Warning message:
Invalid metadata$r 

Reporter: Jonathan Keane / @jonkeane
Assignee: Jonathan Keane / @jonkeane

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12542. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions