consider to replace "A" (active infections) and "B" (secondary infections) w. one large fixed-size matrix, and track indices? #5

ashlinrichardson · 2020-08-12T07:10:56Z

Hi Henry, for discussion..

..Hypothetically, how about:
The data you're currently storing in data frames "A" and "B", all goes in a big fixed-size matrix M (m x n), which is fair since you have a population ceiling anyways (either human pop or computer memory)

matrix not dataframe, as you said
likely can set to 32bit float not 64bit, to increase speed! This can speedup 4x ish or more..
modify in-place as needed but never copy, move or delete any rows:

Henry just like you said:
do have to keep track of where the active and secondary infections (or free matrix rows) are in M..

What if: Use vector types to store the three categories of indexes to rows of M:
A: active
B: secondary
C: free space

(to merge A and B, in C++/Java you'd use a linked list and attach the tail pointer of A to the head of B, and vice versa if it's doubly linked so the merge operation is O(1)..
.. hopefully R also allows you to "stick" lists together (fingers crossed) but if not, if the high-level language insists doing something under the hood like appending B to A element wise you still get ||B|| * O(1)..
.... which is much better than ||B|| * n * O(1) in the case of moving all the data around (so just moving indices around is an effective pyramid scheme)

The motivation for the above:

no explicit counters
no filling / copying whole vectors (likely time savings of (n-1)/n)
explicitly tracking locations to remove any possible ambiguity re: what the high-level language is doing with the data

although there could be many other (better) approaches..

..Thoughts? Thanks for posting a super cool and fun code! Enjoying learning from the great style and learning lots about R from it too
cheers
A

ashlinrichardson · 2020-08-12T07:16:16Z

P.S. I do like the idea of not using data frame and wonder if it would look like this (it may be crazy, I always enjoy removing "features" of programming languages if I can, syntactic "sugar" is overrated, everybody knows it seems that simple sugars promote uncontrolled cell division haha)

create_state_df <- function(n_cases, sim_params, sim_status,
                            initialize=FALSE, import=FALSE,
                            primary_state_df=NULL, primary_case_ids=NULL){
  # List of columns in state df
  col_names <- c("case_id", "status", "is_traced", "is_trace_app_user", "is_trace_app_comply",
                 "is_traced_by_app", "is_symptomatic", "days_infected", "incubation_length",
                 "isolation_delay",  "infection_length", "contact_rate", "n_sec_infects",
                 "generation_intervals") #
  n_cols <- length(col_names)

  ix <- list()
  for(i in 1:n_cols){
    ix[[col_names[i]]] <- i
  }

  # Create data frame
  state_df <- matrix(0., nrow=n_cases, ncol=n_cols)

  # If no cases being added, then return empty data frame
  # This should only happen when initializing
  if(n_cases == 0){
    return(state_df)
  }

  # Fill in start values
  if(initialize){
    state_df[,ix[['case_id']]] <- 1:n_cases # special case for starting out
  }
  else{
    state_df[,ix[['case_id']]] <- sim_status$last_case_id + 1:n_cases
  }
.....

henry-ngo · 2020-08-12T15:10:25Z

Thanks for the insights! Dataframes were helpful in developing the code to get it completed quickly when things were changing a lot (and didn't want to fiddle with things tricky indexing issues). Now that the simulation base is more stable, it's a good time to think about the matrix approach and how R handles things to get the best speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider to replace "A" (active infections) and "B" (secondary infections) w. one large fixed-size matrix, and track indices? #5

consider to replace "A" (active infections) and "B" (secondary infections) w. one large fixed-size matrix, and track indices? #5

ashlinrichardson commented Aug 12, 2020

ashlinrichardson commented Aug 12, 2020 •

edited

Loading

henry-ngo commented Aug 12, 2020

consider to replace "A" (active infections) and "B" (secondary infections) w. one large fixed-size matrix, and track indices? #5

consider to replace "A" (active infections) and "B" (secondary infections) w. one large fixed-size matrix, and track indices? #5

Comments

ashlinrichardson commented Aug 12, 2020

ashlinrichardson commented Aug 12, 2020 • edited Loading

henry-ngo commented Aug 12, 2020

ashlinrichardson commented Aug 12, 2020 •

edited

Loading