You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
..Hypothetically, how about:
The data you're currently storing in data frames "A" and "B", all goes in a big fixed-size matrix M (m x n), which is fair since you have a population ceiling anyways (either human pop or computer memory)
matrix not dataframe, as you said
likely can set to 32bit float not 64bit, to increase speed! This can speedup 4x ish or more..
modify in-place as needed but never copy, move or delete any rows:
Henry just like you said:
do have to keep track of where the active and secondary infections (or free matrix rows) are in M..
What if: Use vector types to store the three categories of indexes to rows of M:
A: active
B: secondary
C: free space
(to merge A and B, in C++/Java you'd use a linked list and attach the tail pointer of A to the head of B, and vice versa if it's doubly linked so the merge operation is O(1)..
.. hopefully R also allows you to "stick" lists together (fingers crossed) but if not, if the high-level language insists doing something under the hood like appending B to A element wise you still get ||B|| * O(1)..
.... which is much better than ||B|| * n * O(1) in the case of moving all the data around (so just moving indices around is an effective pyramid scheme)
The motivation for the above:
no explicit counters
no filling / copying whole vectors (likely time savings of (n-1)/n)
explicitly tracking locations to remove any possible ambiguity re: what the high-level language is doing with the data
although there could be many other (better) approaches..
..Thoughts? Thanks for posting a super cool and fun code! Enjoying learning from the great style and learning lots about R from it too
cheers
A
The text was updated successfully, but these errors were encountered:
P.S. I do like the idea of not using data frame and wonder if it would look like this (it may be crazy, I always enjoy removing "features" of programming languages if I can, syntactic "sugar" is overrated, everybody knows it seems that simple sugars promote uncontrolled cell division haha)
create_state_df<-function(n_cases, sim_params, sim_status,
initialize=FALSE, import=FALSE,
primary_state_df=NULL, primary_case_ids=NULL){
# List of columns in state dfcol_names<- c("case_id", "status", "is_traced", "is_trace_app_user", "is_trace_app_comply",
"is_traced_by_app", "is_symptomatic", "days_infected", "incubation_length",
"isolation_delay", "infection_length", "contact_rate", "n_sec_infects",
"generation_intervals") #n_cols<- length(col_names)
ix<-list()
for(iin1:n_cols){
ix[[col_names[i]]] <-i
}
# Create data framestate_df<-matrix(0., nrow=n_cases, ncol=n_cols)
# If no cases being added, then return empty data frame# This should only happen when initializingif(n_cases==0){
return(state_df)
}
# Fill in start valuesif(initialize){
state_df[,ix[['case_id']]] <-1:n_cases# special case for starting out
}
else{
state_df[,ix[['case_id']]] <-sim_status$last_case_id+1:n_cases
}
.....
Thanks for the insights! Dataframes were helpful in developing the code to get it completed quickly when things were changing a lot (and didn't want to fiddle with things tricky indexing issues). Now that the simulation base is more stable, it's a good time to think about the matrix approach and how R handles things to get the best speed.
Hi Henry, for discussion..
..Hypothetically, how about:
The data you're currently storing in data frames "A" and "B", all goes in a big fixed-size matrix M (m x n), which is fair since you have a population ceiling anyways (either human pop or computer memory)
Henry just like you said:
do have to keep track of where the active and secondary infections (or free matrix rows) are in M..
What if: Use vector types to store the three categories of indexes to rows of M:
A: active
B: secondary
C: free space
(to merge A and B, in C++/Java you'd use a linked list and attach the tail pointer of A to the head of B, and vice versa if it's doubly linked so the merge operation is O(1)..
.. hopefully R also allows you to "stick" lists together (fingers crossed) but if not, if the high-level language insists doing something under the hood like appending B to A element wise you still get ||B|| * O(1)..
.... which is much better than ||B|| * n * O(1) in the case of moving all the data around (so just moving indices around is an effective pyramid scheme)
The motivation for the above:
although there could be many other (better) approaches..
..Thoughts? Thanks for posting a super cool and fun code! Enjoying learning from the great style and learning lots about R from it too
cheers
A
The text was updated successfully, but these errors were encountered: