Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: avoid creation of new Quantity-objects in getOutputValues #462

Closed
PavelBal opened this issue Dec 22, 2020 · 6 comments
Closed

Comments

@PavelBal
Copy link
Member

In one particular scenario I have identified a bottleneck in the performance of retrieving the output values of simulated results.

I am simulating a steady-state of the model. For this, I first retrieve all quantities that are described by an ODE - first all molecules, and then all parameters and sorting the parameters that are state variable:

  quantities <- getAllMoleculesMatching("Organism|**", container = simulation)
  # Get all state variable parameters
  allParams <- getAllParametersMatching("**", container = simulation)
... extract param$isStateVariable ...

This already takes almost 4 minutes, and the majority of time is spent on creating the quantities, in particular in creating the quantities' formula objects:

OSPSuite-R/R/quantity.R

Lines 93 to 97 in 44d8471

formula <- private$wrapExtensionMethod(QUANTITY_EXTENSIONS, "GetFormula")
private$.formula <- Formula$new(formula)
if (self$isTable) {
private$.formula <- TableFormula$new(formula)
}

Once identified all quantities (lets call them allQuantities), these are used as outputs in the simulation. Simulating and getting the results consumes 53 seconds, of which only 20 seconds are for the simulation itself - the other 30 seconds are spent on getting the results. Once again, much time is spent on creating Quantity-objects, including here:

for (path in paths) {
quantity <- getQuantity(path, simulationResults$simulation, stopIfNotFound = stopIfNotFound)
metaData[[path]] <- list(unit = quantity$unit, dimension = quantity$dimension)
values[[path]] <- simulationResults$getValuesByPath(path, individualIds, stopIfNotFound)
}

However, creating Quantity-objects is not necessary if a list of quantities is provided instead of paths. By re-writing the method I could improve the time from 53 to 40 seconds - which is not negligible if the scenario is performed multiple times (e.g. in a parameter identification).

I will submit a PR.

@msevestre
Copy link
Member

This already takes almost 4 minutes

That I find very surprising. How large is your model?
Also getAllParametersMatching(**) looks like you want all parameters. Don't we have a more optimized function for this?

@PavelBal
Copy link
Member Author

That I find very surprising. How large is your model?

Yes this is crazy. Well, the model is quite large - the exported pkml is 50mb...

getAllMoleculesMatching("Organism|**", container = simulation) takes 20 seconds, with quite some time spent on creating/accessing the formulas (returns 3800 elements):

image

And getAllParametersMatching(**) is ridiculously slow (returns 37500 elements) with 3.6 minutes!.

Retrieving only state variable parameters would be a great improvement, but I did not find how to do it directly.

image

@msevestre
Copy link
Member

40k parameters lol
What I mean is getAllParameterMatching(**) will try to match every single path by regex. I think it would be faster to load them all and then filter

How do I read this perf graph? I see the number of time a method is called but I don't see the overall time

@PavelBal
Copy link
Member Author

What I mean is getAllParameterMatching(**) will try to match every single path by regex. I think it would be faster to load them all and then filter

But matching is performed in .net, right?

getAllEntitiesMatching <- function(paths, container, entityType, method = NULL) {
# Test for correct inputs
validateIsOfType(container, c(Simulation, Container, Molecule))
validateIsString(paths)
validateIsString(method, nullAllowed = TRUE)
className <- entityType$classname
if (length(which(names(AllMatchingMethod) == className)) == 0) {
stop(messages$errorWrongType("entityType", className, names(AllMatchingMethod)))
}
task <- getContainerTask()
method <- method %||% AllMatchingMethod[[className]]
findEntitiesByPath <- function(path) {
toObjectType(rClr::clrCall(task, method, container$ref, enc2utf8(path)), entityType)
}
return(unify(findEntitiesByPath, paths))

This is fast, definitely not the bottle-neck.

How do I read this perf graph? I see the number of time a method is called but I don't see the overall time

Those are the ms spent on each line. What I try to show here that a great amount of time in spent on formula-related tasks. For each .net-quantity, an r-Quantity is created and "GetFormula" and "self$isTable" (which calls self$formula$isTable) takes about 45 seconds each (in total).

@msevestre
Copy link
Member

self$isTable is 45s ?? WHAT? We are accessing a boolean variable here

@msevestre
Copy link
Member

There are cleary some stuff to explore. self$isTable just calls self$formula$isTable. This could be cached. However it seems to be only called once anyway. Maybe the call in .NET takes time (I don't understand how)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants