Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runSimulations vs runSimulationBatches #615

Closed
abdullahhamadeh opened this issue Sep 13, 2021 · 12 comments
Closed

runSimulations vs runSimulationBatches #615

abdullahhamadeh opened this issue Sep 13, 2021 · 12 comments

Comments

@abdullahhamadeh
Copy link
Member

sim-batch-test.zip

I've run a script to compare 1) sequential runs of runSimulation, 2) runSimulations, 3) runSimulationBatches, (attached).

I found that 1 and 3 yield very similar simulation times, but there are obvious improvements with 2:

`[1] "Pure sequential run:"
12.92 sec elapsed

[1] "Parallelized using runSimulations:"
5.41 sec elapsed

[1] "Using runSimulationBatches:"
12.65 sec elapsed`

Under what scenarios does runSimulationBatches speed up the process?

@msevestre
Copy link
Member

Interesting.. I would have expected 2 and 3 to yield the same time more or less in your scenario.
@PavelBal @abdelr Any idea why? Aren't we initializing the simulation also in parallel?

At any rate, 3 really shines when used in a loop or in an algorithm such as PI etc... where you have to perform multiple run

For instance, say you would so sthg like this

simBatch1$addRunValues(parameterValues = 1)
simBatch2$addRunValues(parameterValues = 0.1)
simBatch3$addRunValues(parameterValues = 0.01) **not here you are setting simBatch2 twice**
tictoc::tic()
res.bat <- runSimulationBatches(simulationBatches = list(simBatch1,simBatch2,simBatch3))
tictoc::toc()

simBatch1$addRunValues(parameterValues = 1)
simBatch2$addRunValues(parameterValues = 0.1)
simBatch3$addRunValues(parameterValues = 0.01)
tictoc::tic()
res.bat <- runSimulationBatches(simulationBatches = list(simBatch1,simBatch2,simBatch3))
tictoc::toc()

xx

tictoc::tic()
res.bat <- runSimulationBatches(simulationBatches = list(simBatch1,simBatch2,simBatch3))
tictoc::toc()

I would expect all other times to be much faster, even faster than the parallel run

@abdullahhamadeh
Copy link
Member Author

Thanks @msevestre

What is returned if one of the simulations run in runSimulations fails? Do we still get a list of length = number of simulations that were input? Or is it a shorter list?

@msevestre
Copy link
Member

Also a very good question. Easy to check by setting a parameter such as a solver parameter to something off. I want to understand why #3 is not faster so I am going to check it out and report

@PavelBal
Copy link
Member

PavelBal commented Sep 13, 2021 via email

@msevestre
Copy link
Member

@abdullahhamadeh If I run the scenario 3 three times in a row:

[1] "Using runSimulationBatches:"
7.89 sec elapsed
0.07 sec elapsed
0.06 sec elapsed
0.08 sec elapsed

You can see how the simulation overhead for initialization is gone... It's almost too fast for my liking to be hoenst...

@PavelBal Did you check that it actually works as expected...?

@msevestre
Copy link
Member

msevestre commented Sep 13, 2021

@abdelr I am not quite sure why the call takes so long the first time. It looks like the batches are initialized sequentially. not a big deal since it happens only once but it would be interesting to understand why

@msevestre
Copy link
Member

@abdelr Found the issue. Creating a report for thius

@msevestre
Copy link
Member

@PavelBal
Copy link
Member

Actually, the way to use the batch would be to create only one batch and add three value sets:

simBatch1 <- createSimulationBatch(simulation = sim1,parametersOrPaths = tree$Raltegravir$Lipophilicity$path)
simBatch1$addRunValues(parameterValues = 1)
simBatch1$addRunValues(parameterValues = 0.1)
simBatch1$addRunValues(parameterValues = 0.01)

results <- runSimulationBatches(simulationBatches = simBatch1)

@PavelBal
Copy link
Member

PavelBal commented Sep 14, 2021

I also did some profiling and the results are... well, confusing :D

Load one simulation, simulate three times with different doses in a loop (Sequential):

sim <- loadSimulation("inst/extdata/Aciclovir.pkml")
param <- getAllParametersMatching("Applications|**|Dose", sim)
doseVals <- c(0.00025, 0.0001, 0.00005)

results <- vector("list", 3)
#Linear in a loop
system.time(
  for (i in seq_along(doseVals)){
  setParameterValues(param, doseVals[[i]])
  results[[i]] <- runSimulations(sim)
}
)

2.94 sec

Load the simulation three times, and run in parallel using runSimulations()

#Parallel with runSimulations
simulations <- lapply(doseVals, \(x){sim <- loadSimulation("inst/extdata/Aciclovir.pkml")
setParameterValuesByPath("Applications|IV 250mg 10min|Application_1|ProtocolSchemaItem|Dose", values = x, simulation = sim)
return(sim)
})

system.time(
results <- runSimulations(simulations)
)

0.96 sec - amazing speed-up of simulation time, perfect parallelization. However, the comparison is not 100% fair as we do not count the time for loading simulations.

Simulation batch with three runs:

#Simulation batch
simulationBatch <- createSimulationBatch(sim, parametersOrPaths = param)
for (i in doseVals){
  simulationBatch$addRunValues(i)
}

system.time(
results <- runSimulationBatches(simulationBatch)
)

2.48 sec - no benefit as compared with the sequential run!!!!

Second run of the batch, dose values doubled to make sure the simulations are actually re-run with new values:

for (i in doseVals*2){
  simulationBatch$addRunValues(i)
}

system.time(
results <- runSimulationBatches(simulationBatch)
)

0.06 sec !!!! HOW? Seriously, HOW? And yes it produces the correct results, I plotted them.

@abdelr
Copy link
Member

abdelr commented Sep 14, 2021

0.06 sec !!!! HOW? Seriously, HOW? And yes it produces the correct results, I plotted them.

The batch solution prepares everything at the beginning. It is currently not preparing in parallel but Michael already solved that. From there on, every call just simulate but do not initialize the simulations since they are already initialized. The first snippet will actually initialize the simulations when you run them. This is why the runSimulationBatches will perform faster the second time, even faster than the runSimulations on the first snippet. I would expect now that the version we have (with the PR fixing the initialization not done in parallel) to perform initially as good as the runSimulations and still, very fast the second time.

@msevestre
Copy link
Member

msevestre commented Sep 14, 2021

@PavelBal
The code from Abduallah was loading all simulations before the tic/toc

# Pure sequential run
print("Pure sequential run:")
tictoc::tic()
res1 <- runSimulation(simulation = sim1)
res2 <- runSimulation(simulation = sim2)
res3 <- runSimulation(simulation = sim3)
tictoc::toc()


# Parallelized using runSimulations
print("Parallelized using runSimulations:")
tictoc::tic()
res.par <- runSimulations(simulations = list(sim1, sim2, sim3))
tictoc::toc()

So in this case, you can see the pure benefit of running in parallel.
With the simulation you used, that is clearly very small (2.48 sec for 3 sequential init + run), the benefits will be really irrelevant

as to the time for the batch. I am also very surprised by the gain. I have implemented the Batch component myself and I know that we are optimizing a lot. Yet running the simulation in 0.06 sec seems very very surprising. We do not get this kind of speed for the population simulation that is using the exact same concept. It could be that we do lose some time with the UI thread always trying to refresh the progress bar etc... but still.....

@PavelBal If you say that the outputs are correct....then maybe we have just a fantastic running engine in R... but I am still suspicious

msevestre added a commit that referenced this issue Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants