Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Age-disaggregated outputs can be slow #275

Open
pwinskill opened this issue Dec 14, 2023 · 3 comments
Open

Age-disaggregated outputs can be slow #275

pwinskill opened this issue Dec 14, 2023 · 3 comments

Comments

@pwinskill
Copy link
Member

Specifying incidence of clinical or severe malaria disaggregated by age can slow down the model a lot. Having spoken with @giovannic, this is likely more to do with the implementation of estimating age-disaggregated outputs, rather than the overhead of rendering more output.

An example comparing a single age-band with 100 age bands:

library(malariasimulation)

# Parameters that specify more output with a single age group
p_single_age <- get_parameters(
  overrides = list(
    human_population = 1000
  )
) |>
  set_equilibrium(
    init_EIR = 5
  )
p_single_age$clinical_incidence_rendering_min_ages <- 0
p_single_age$clinical_incidence_rendering_max_ages <- 100 * 365

# Parameters that specify output with multiple age groups
p_multiple_age <- p_single_age
p_multiple_age$clinical_incidence_rendering_min_ages <- 0:99 * 365
p_multiple_age$clinical_incidence_rendering_max_ages <- 1:100 * 365

system.time({
  s_single_age <- run_simulation(
    timesteps = 365 * 2,
    parameters = p_single_age
  )
})

system.time({
  s_multiple_age <- run_simulation(
    timesteps = 365 * 2,
    parameters = p_multiple_age
  )
})
@giovannic
Copy link
Member

Yes. The current implementation calls birth$get_index_of(a = timestep - upper, b = timestep - lower) for each age group each timestep.

I'd guess a c++ implementation which computes all of the stats from all the births and intervals will save ~99 RCpp calls per iteration.

@giovannic
Copy link
Member

Found an R implementation for a personal project which is 6x faster feat/grid_render.

But it makes assumptions about the age groups which will be output (year wide age groups from 0 to 100). And some kinks need to be resolved, it doesn't output the same values as the existing method for some reason 🤔

Test, outputs and plots are attached below:

# Parameters that specify more output with a single age group
p_single_age <- get_parameters(
  overrides = list(
    human_population = 1000
  )
) |>
  set_equilibrium(
    init_EIR = 5
  )
p_single_age$prevalence_rendering_min_ages <- 0
p_single_age$prevalence_rendering_max_ages <- 100 * 365

# Parameters that specify output with multiple age groups
p_multiple_age <- p_single_age
p_multiple_age$prevalence_rendering_min_ages <- 0:99 * 365
p_multiple_age$prevalence_rendering_max_ages <- 1:100 * 365 - 1

p_grid <- p_single_age
p_grid$render_grid <- TRUE

set.seed(1)

system.time({
  s_single_age <- run_simulation(
    timesteps = 365 * 2,
    parameters = p_single_age
  )
})

set.seed(1)

system.time({
  s_grid <- run_simulation(
    timesteps = 365 * 2,
    parameters = p_grid
  )
})

jpeg('grid.jpg')
for (i in c(1, 50, 100)) {
  if (i == 1) {
    plot(s_grid$grid$timestep, s_grid$grid$n_detect[,i], type = 'l')
  } else {
    lines(s_grid$grid$timestep, s_grid$grid$n_detect[,i])
  }
}
dev.off()

set.seed(1)

system.time({
  s_multiple_age <- run_simulation(
    timesteps = 365 * 2,
    parameters = p_multiple_age
  )
})

jpeg('multiple.jpg')
for (i in c(1, 50, 100)) {
  if (i == 1) {
    plot(
      as.numeric(s_multiple_age$timestep),
      as.numeric(s_multiple_age[[paste0(
        'n_detect_',
        p_multiple_age$prevalence_rendering_min_ages[i],
        '_',
        p_multiple_age$prevalence_rendering_max_ages[i]
      )]]),
      type = 'l'
    )
  } else {
    lines(
      as.numeric(s_multiple_age$timestep),
      as.numeric(s_multiple_age[[paste0(
        'n_detect_',
        p_multiple_age$prevalence_rendering_min_ages[i],
        '_',
        p_multiple_age$prevalence_rendering_max_ages[i]
      )]])
    )
  }
}
dev.off()

outputs:

root@c40d0e961b76:/opt# Rscript test.R
   user  system elapsed              # single
  4.994   0.000   4.995
   user  system elapsed              # grid
  6.935   0.000   6.910
   user  system elapsed              # multiple
 46.173   0.000  46.169

Multiple:
multiple

Grid:
grid

@giovannic
Copy link
Member

It was an RNG issue

multiple
grid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants