Avoid use of apply in computing infection status in schisto module #683
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The profiling described in #286 (comment) suggested that the use of the
DataFrame.apply
method in the linesTLOmodel/src/tlo/methods/schisto.py
Lines 550 to 553 in 7669163
in the
SchistoSpecies.update_infectious_status_and_symptoms
method may be a bottleneck, due to the resulting Python iteration over each row in the (filtered) population dataframe.This PR replaces the
apply
call by instead creating aSeries
object directly and using boolean indexing to implement the corresponding logic while avoiding the row-by-row iteration.Doing a basic microbenchmark of applying the new boolean indicing based
_get_infection_status
function to a subset of randomly generated population dataframe with 50000 rows compared to the previousapply
based approach suggests the new approach should be significantly quicker:Microbenchmark code