Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasterizing a factor field should yield a factor raster with appropriate RAT table (and related issues) #26

Open
JoshOBrien opened this issue May 13, 2019 · 1 comment

Comments

@JoshOBrien
Copy link

JoshOBrien commented May 13, 2019

At present fasterize() uses the input raster as a template, modifying many components of it @data slot (along with other slots) to form the raster that it returns. It does not, however, do anything to make sure that the @data@factor and @data@attributes slots -- the ones responsible for marking a raster as a factor and recording its lvels in a raster attribute table (RAT) -- are appropriately modified based on the type of polygon field used to rasterize them.

This leads to several undesirable and potentially surprising behaviors. For example:

  • if passed a numeric template raster, the output raster will always be numeric whether the field used to rasterize it is numeric or factor.

  • if passed a factor template raster, the function will always be a factor raster with the same RAT as the template raster, even if the field used to rasterize it is numeric.

  • if passed a factor template raster and a factor field, fasterize() will return a factor raster (as the user might expect) but one with the RAT of the template raster -- not the rasterizing field.

I am copying below a wrapper function that I have used for a while and that manages all the RAT information that fasterize() itself at present neglects to deal with. It ensures that:

  • numeric fields always return numeric rasters

  • factor fields always return factor rasters with RATs that include all levels of the field on which they were based

  • character fields return factor rasters with RATs that include as levels all unique character strings in the character field column.

(That last addresses Issue #24. I find that feature quite useful, but it is not as important as the first two bulleted items above.)

The longish listing below first demonstrates some of the undesirable behaviors of the current version of fasterize() and then demonstrates one way to address them, using a wrapper function I here call fasterize2() and which could potentially be used as a drop in replacement for the curent fasterize().

library(fasterize)
library(sf)
library(raster)
library(rasterVis)

##------------------------------------------------------------------------------
##  Example data
##------------------------------------------------------------------------------

##-----------##
##  Rasters  ##
##-----------##
set.seed(1)
## Numeric raster with values in 1:4
r_num <- raster(ncol = 20, nrow = 20)
r_num[] <- sample(1:4, size = ncell(r_num), replace = TRUE)
extent(r_num) <- extent(c(0, 20, 0, 20))
## Factor raster with 4 levels for landcover variable
r_fact <- as.factor(r_num)
rat <- levels(r_fact)[[1]]
rat[["landcover"]] <- c("oak","pine", "scrub", "grass")
levels(r_fact) <- rat

##----------##
## Polygons ##
##----------##
a <- list(rbind(c(2,8), c(5,14), c(8,8), c(2,8)))
b <- list(rbind(c(5,14), c(8,8), c(11,14), c(5,14)))
c <- list(rbind(c(11,14), c(14,14.5), c(14,13.5), c(11,14)))
pp <- st_sf(char = paste0("c_", 1:3),
            fact = factor(paste0("f_", 1:3)),
            num = 1:3,
            geometry = st_sfc(lapply(list(a, b, c), st_polygon)),
            crs = 4326)

## Plot
levelplot(r_fact, margin = FALSE,
          col.regions = rev(terrain.colors(4))) +
    layer(sp.polygons(as_Spatial(pp),
                      fill = blues9[c(3,6,9)],
                      col = NA))



##------------------------------------------------------------------------------
##  Several situations in which fasterize() performs less than ideally
##------------------------------------------------------------------------------

## (1) Numeric template raster + factor polygon attribute ------>
##     numeric raster
X <- fasterize(pp, r_num, field = "fact")
is.factor(X)

## (2) Factor template raster + numeric polygon attribute ------>
##     factor raster
X <- fasterize(pp, r_fact, field = "num")
is.factor(X)

## (3) Factor template raster + factor polygon attribute ------>
##     factor raster inheriting RAT from template raster, NOT polygon
##     attribute
X <- fasterize(pp, r_fact, field = "fact")
levels(r_fact)[[1]]
levels(X)[[1]]
pp$fact


##------------------------------------------------------------------------------
## Modified version of fasterize(), which outputs:
##  - factor rasters when passed character or factor polygon atttributes
##  - numeric rasters when passed numeric polygon attributes
##  - rasters whose RAT is based on polygon attribute, not input raster's RAT
##------------------------------------------------------------------------------

fasterize2 <- function (sf,
                        raster,
                        field = NULL,
                        fun = "last",
                        background = NA_real_,
                        by = NULL) {
  if (is.null(field)) {
        field <- names(sf)[1]
  }
  field_class <- class(sf[[field]])
  ## Convert character field to factor
  if (field_class == "character") {
    val <- sf[[field]]
    fac <- factor(val, levels = sort(unique(val)))
    sf[[field]] <- fac
  }
  ## Rasterize polygons
  out_raster <- .Call("_fasterize_fasterize", PACKAGE = "fasterize",
                      sf, raster, field, fun, background, by)
  ## Ensure that any RAT attached to output raster is derived from
  ## input char or factor field
  if(field_class %in% c("character", "factor")) {
    lev <- levels(sf[[field]])
    RAT <- data.frame(ID = seq_along(lev), VALUE = lev)
    names(RAT)[2] <- field
    ## Silence warning emitted when overwriting a RAT from input
    ## raster
    suppressWarnings(levels(out_raster) <- RAT)
  } else {
    out_raster@data@isfactor <- FALSE
    out_raster@data@attributes <- list()
  }
  out_raster
}


##------------------------------------------------------------------------------
##  Tests
##------------------------------------------------------------------------------

## Test fasterize2() with NUMERIC template raster
o_char1 <- fasterize2(pp, r_num, field = "char")
o_fact1 <- fasterize2(pp, r_num, field = "fact")
o_num1 <- fasterize2(pp, r_num, field = "num")
## Check results
is.factor(o_char1)
is.factor(o_fact1)
is.factor(o_num1)
levels(o_char1)
levels(o_fact1)
levels(o_num1)
levelplot(o_char1, col.regions = blues9[c(3,6,9)])
levelplot(o_fact1, col.regions = blues9[c(3,6,9)])
levelplot(o_num1, margin = FALSE)


## Test fasterize2() with FACTOR template raster
o_char2 <- fasterize2(pp, r_fact, field = "char")
o_fact2 <- fasterize2(pp, r_fact, field = "fact")
o_num2 <- fasterize2(pp, r_fact, field = "num")
## Check results by comparing with output of NUMERIC template raster
identical(o_char2, o_char1)
identical(o_fact2, o_fact1)
identical(o_num2, o_num1)
JoshOBrien added a commit to JoshOBrien/rasterDT that referenced this issue Jun 17, 2019
As described in an Issue filed at the **fasterize** package's GitHub
repository (ecohealthalliance/fasterize#26),
`fasterize` returns rasters that match the class of their template
raster, not of the field that the user selected to be burned into
them. In addition, when passed a character or a factor field, it does
not -- as would often be desireable -- return a factor-valued raster.

`fasterizeDT()` addresses both of these issues, consistently returning
a raster whose class (and, if a factor, levels) matches that of the
user-selected field in the input polygon object.

In addition, `fasterizeDT()` accepts `sp::SpatialPolygonsDataFrame()`
objects, converting them to `sf::sf()` objects before passing them on
to `fasterize::fasterize()`.
@JoshOBrien
Copy link
Author

JoshOBrien commented Jul 21, 2019

@noamross Just pinging you once to ask whether you might welcome a PR incorporating these tweaks, or whether you've had a look at the above and are not so inclined.

Because the functionality added by the tweaks is so useful to me, I've included them in a function fasterizeDT in my own rasterDT package, but would of course rather make these changes upstream, if you think that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant