Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulated hash tables #967

Closed
2 tasks done
wlandau opened this issue Aug 1, 2019 · 2 comments
Closed
2 tasks done

Encapsulated hash tables #967

wlandau opened this issue Aug 1, 2019 · 2 comments

Comments

@wlandau
Copy link
Member

wlandau commented Aug 1, 2019

Prework

Description

The hash table deserves proper encapsulation. We could wrap the functions in hash_tables.R into a reference class.

@wlandau
Copy link
Member Author

wlandau commented Aug 2, 2019

Thinking about using reference classes for this. Need to figure out if it will slow us down or take up more memory.

@wlandau
Copy link
Member Author

wlandau commented Aug 10, 2019

These are our best options. Neither looks good.

  1. S3. Doesn't really add convenience. To have a use for S3 methods and generics, there need to be alternative types floating around, which there aren't here.
  2. R5. Reference classes are inefficient relative to the plain environments drake is already using. They take too long to create, and they take up too much space in memory (see the reprex below). This matters because drake creates a lot of hash tables for code analysis.

However, we can still encapsulate a bunch of hash tables in an encoder in the decorated cache, re #968.

new_ht <- function() {
  envir <- new.env(hash = TRUE, parent = emptyenv())
  htref$new(ht = envir)
}

htref <- methods::setRefClass(
  Class = "htref",
  fields = c("ht"),
  # The whole point of a reference class in this case
  # is to contain all the methods we will use for the hash table.
  methods = list(
    ht_reset <- function(x = NULL, hash = TRUE) {
      out <- new.env(hash = hash, parent = emptyenv())
      if (!is.null(x)) {
        ht_set(out, x)
      }
      out
    },
    
    ht_new_from_list <- function(x, hash = (length(x) > 100)) {
      list2env(x, hash = hash, parent = emptyenv())
    },
    
    ht_set <- function(ht, x, value = TRUE) {
      lapply(
        X = x,
        FUN = assign,
        value = value,
        envir = ht,
        inherits = FALSE
      )
      invisible()
    },
    
    ht_get <- function(ht, x) {
      get(x = x, envir = ht, inherits = FALSE)
    },
    
    ht_del <- function(ht, x) {
      remove(list = x, envir = ht, inherits = FALSE)
    },
    
    ht_list <- function(ht) {
      names(ht)
    },
    
    ht_clear <- function(ht) {
      rm(list = names(ht), envir = ht)
    },
    
    ht_clone <- function(ht) {
      list2env(as.list(ht), hash = TRUE, parent = emptyenv())
    },
    
    ht_filter <- function(ht, x) {
      index <- vapply(
        X = x,
        FUN = ht_exists,
        FUN.VALUE = logical(1),
        ht = ht
      )
      x[index]
    },
    
    ht_exists <- function(ht, x) {
      exists(x, envir = ht, inherits = FALSE)
    },
    
    # Merge y into x
    ht_merge <- function(x, y) {
      ht_set(x, ht_list(y))
    },
    
    # hash-table-based memoization for characters
    ht_memo <- function(ht, x, fun, ...) {
      if (is.null(ht)) {
        return(lapply(X = x, FUN = fun, ...))
      }
      vapply(
        X = x,
        FUN = ht_memo_single,
        FUN.VALUE = character(1),
        USE.NAMES = FALSE,
        ht = ht,
        fun = fun,
        ...
      )
    },
    
    # x must be a character scalar.
    ht_memo_single <- function(ht, x, fun, ...) {
      if (ht_exists(ht = ht, x = x)) {
        ht_get(ht = ht, x = x)
      } else {
        value <- fun(x, ...)
        assign(x = x, value = value, envir = ht, inherits = FALSE)
        value
      }
    }
  )
)
 
library(microbenchmark)
microbenchmark(
  no_class = drake:::ht_new(),
  refclass = new_ht()
)
#> Unit: microseconds
#>      expr    min      lq     mean  median      uq      max neval
#>  no_class  3.100  3.6570 884.6323  4.3185  4.7760 88038.40   100
#>  refclass 55.726 57.9325 327.1402 59.2050 61.7605 25633.85   100

no_class <- drake:::ht_new()
refclass <- new_ht()

library(pryr)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
object_size(no_class)
#> 336 B
object_size(refclass)
#> 567 kB

Created on 2019-08-09 by the reprex package (v0.3.0)

@wlandau wlandau closed this as completed Aug 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant