-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow 'batched' changes to multiple symptoms at once #409
Conversation
|
||
for symptom in symptom_list_severe: | ||
# Let u ~ Uniform(0, 1) and p ~ Uniform(prop_lower, prop_upper), | ||
# then the probability of the event (u < p) is (prop_lower + prop_upper) / 2 | ||
# That is the probability of b == True in the following code snippet | ||
# b = rng.uniform() < rng.uniform(low=prop_lower, high=prop_upper) | ||
# and this one | ||
# b = rng.uniform() < (prop_lower + prop_upper) / 2 | ||
# are equivalent. | ||
persons_gaining_symptom = severe_index[ | ||
rng.uniform(size=len(severe_index)) | ||
< ( | ||
range_symp.at[symptom, "prop_lower"] | ||
+ range_symp.at[symptom, "prop_upper"] | ||
) / 2 | ||
] | ||
# schedule symptom onset | ||
self.sim.modules["SymptomManager"].change_symptom( | ||
person_id=persons_gaining_symptom, | ||
symptom_string=symptom, | ||
add_or_remove="+", | ||
disease_module=self, | ||
duration_in_days=None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the decision of whether a person gained a symptom or not was made based on whether a random variable drawn from Uniform(0, 1)
was less than a second independent random variable drawn from Uniform(a, b)
where 0 < a < b < 1
with a
the lower bound for a range of a probability and b
the upper bound. The marginal probability of the gaining symptom event under this two stage process is equal to (a + b) / 2
so I have switched to just drawing a single set of Uniform(0, 1)
variables here. In probability the behaviour should be identical to previously, but as the sequence of random variates drawn from the pseudo-random number generator will differ the exact decisions made for a given starting seed will differ so for example the final population dataframe checksums for equivalent runs will not match before and after the changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @tdm32 here just to confirm if this is the logic intended originally.
Profiling results suggest these changes have actually slowed things down 😬. Total time for a 5 year profiled run of |
I finally managed to track down what was causing the performance regression. As well as extending Specifically I changed person_id = df.index[df.is_alive & (df.index.isin(person_id))] to persons = df.loc[person_id]
person_id = persons[persons.is_alive].index in I also changed df = self.sim.population.props
group_indices = {
'children': df.index[df.is_alive & (df.age_years < 15)],
'adults': df.index[df.is_alive & (df.age_years >= 15)]
}
# For each generic symptom, impose it on a random sample of persons who do not have that symptom currently:
for symp in sorted(self.module.generic_symptoms):
do_not_have_symptom = self.module.who_not_have(symptom_string=symp)
for group in ['children', 'adults']:
...
persons_eligible_to_get_symptom = group_indices[group][
group_indices[group].isin(do_not_have_symptom)
]
... to df = self.sim.population.props
group_selector = {'children': df.age_years < 15, 'adults': df.age_years >= 15}
# For each generic symptom, impose it on a random sample of persons who do not have that symptom currently:
for symp in sorted(self.module.generic_symptoms):
do_not_have_symptom = self.module.who_not_have(symptom_string=symp)
for group in ['children', 'adults']:
...
eligible_to_get_symptom = group_selector[group] & do_not_have_symptom
persons_eligible_to_get_symptom = df.index[eligible_to_get_symptom]
... in Prior to the changes in this PR (ed25ab1) a 5 year profiled run of After the updates to revert the changes causing the slowdown detailed above (33ad2a5), a 5 year profiled run of I've also added some further tests to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thanks @matt-graham
Please could you also update the brief documentation we have on SymptomManager to explain the new functionality?
Thanks @tbhallett. Yes I will update the wiki with a description of the new functionality. |
Access module more cleanly in measles event Remove redundant assertion in change_symptoms
Now added some documentation at https://github.com/UCL/TLOmodel/wiki/Symptoms-and-the-SymptomManager (also changed the name of the page to reflect its now more general purpose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Love the overhauled BitsetHandler.
Based on the analysis of profiling runs in #286 (comment) it appears that there are several points in the code where multiple symptoms are being updated in a loop as
SymptomManager.change_symptom
allows specifiying on a single symptom to update, and that these operations are consuming relatively large proportions of the overall simulation time. Similarly the currentSymptomManager.clear_symptoms
only allows clearing the symptoms for a single individual at a time, and its use within loops iterating over sets of person IDs is also a bottleneck.This PR updates the
SymptomManager.change_symptom
method to optionally accept a list / set of symptom strings instead of a single symptom string, with all the corresponding symptom bits then simultaneously set or unset. As theSymptomManager_AutoOnsetEvent
andSymptomManager_AutoResolveEvent
events also internally useSymptomManager.change_symptom
to update the symptoms, this change also means that these events also can be 'batched' with a single onset / resolve event, updating multiple symptoms at once, meaning one event can be scheduled when previously an event would be separately scheduled for each symptom.To enable this batch updating, the
BitsetHandler
class intlo.util
has been updated to allow the operations to be performed on multiple integer columns corresponding to bitsets with the same set of elements. Rather than specifying a single column when constructing the bitset handler object, a list of column names corresponding to the bitsets to update / test can optionally instead be passed to the relevantBitsetHandler
methods. The test suite for theBitsetHandler
class intests/test_bitset.py
has been updated to also cover these multiple column operations, and the previous tests for the single column case split into smaller self-contained test cases.The
SymptomManager.clear_symptoms
method is also updated to allow passing multiple person IDs to clear symptoms for at once. AsSymptomManager.clear_symptoms
was found to be a particular bottleneck, the internal logic here has been slightly changed though I think it will still have the same behaviour. Specifically, previously the set of symptoms for which the specified disease module had a bit set for was previously first constructed, and then the bit corresponding to the disease module unset for each of these symptoms. This required several access to the bitset columns to first check which symptoms have the bit set for the disease module, and then subsequently unset these bits. In the new implementation the bits for the specified disease module are unset for all registered symptoms. As unsetting a bit which is unset has no effect this should be equivalent to just unsetting the bits that were set for the specified disease module I think, but saves the additional check operations.Finally, the downstream usages within loops of
SymptomManager.change_symptom
/SymptomManager.clear_symptoms
in disease modules which were identified in #286 (comment) as being bottlenecks are updated to take advantage of the new batched operations, with specificallyAlri
,Diarrhoea
,Malaria
andMeasles
being updated. As in theMalaria
andMeasles
cases the decision of which persons get which symptoms is probabilistic, I also did a bit of refactoring of the operations to sample the appropriate persons / symptoms to allow using the batched operations.I am currently running the
scale_run.py
profiling script after these changes to check the effect on run times and will update with the profiling results when finished. Due to the issue in #385, it is not possible to currently perform runs with theDiarrhoea
module registered so I have manually removed this module from thescale_run.py
configuration for both the runs before and after the changes in this PR.