Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 'batched' changes to multiple symptoms at once #409

Merged
merged 18 commits into from
Nov 23, 2021

Conversation

matt-graham
Copy link
Collaborator

Based on the analysis of profiling runs in #286 (comment) it appears that there are several points in the code where multiple symptoms are being updated in a loop as SymptomManager.change_symptom allows specifiying on a single symptom to update, and that these operations are consuming relatively large proportions of the overall simulation time. Similarly the current SymptomManager.clear_symptoms only allows clearing the symptoms for a single individual at a time, and its use within loops iterating over sets of person IDs is also a bottleneck.

This PR updates the SymptomManager.change_symptom method to optionally accept a list / set of symptom strings instead of a single symptom string, with all the corresponding symptom bits then simultaneously set or unset. As the SymptomManager_AutoOnsetEvent and SymptomManager_AutoResolveEvent events also internally use SymptomManager.change_symptom to update the symptoms, this change also means that these events also can be 'batched' with a single onset / resolve event, updating multiple symptoms at once, meaning one event can be scheduled when previously an event would be separately scheduled for each symptom.

To enable this batch updating, the BitsetHandler class in tlo.util has been updated to allow the operations to be performed on multiple integer columns corresponding to bitsets with the same set of elements. Rather than specifying a single column when constructing the bitset handler object, a list of column names corresponding to the bitsets to update / test can optionally instead be passed to the relevant BitsetHandler methods. The test suite for the BitsetHandler class in tests/test_bitset.py has been updated to also cover these multiple column operations, and the previous tests for the single column case split into smaller self-contained test cases.

The SymptomManager.clear_symptoms method is also updated to allow passing multiple person IDs to clear symptoms for at once. As SymptomManager.clear_symptoms was found to be a particular bottleneck, the internal logic here has been slightly changed though I think it will still have the same behaviour. Specifically, previously the set of symptoms for which the specified disease module had a bit set for was previously first constructed, and then the bit corresponding to the disease module unset for each of these symptoms. This required several access to the bitset columns to first check which symptoms have the bit set for the disease module, and then subsequently unset these bits. In the new implementation the bits for the specified disease module are unset for all registered symptoms. As unsetting a bit which is unset has no effect this should be equivalent to just unsetting the bits that were set for the specified disease module I think, but saves the additional check operations.

Finally, the downstream usages within loops of SymptomManager.change_symptom / SymptomManager.clear_symptoms in disease modules which were identified in #286 (comment) as being bottlenecks are updated to take advantage of the new batched operations, with specifically Alri, Diarrhoea, Malaria and Measles being updated. As in the Malaria and Measles cases the decision of which persons get which symptoms is probabilistic, I also did a bit of refactoring of the operations to sample the appropriate persons / symptoms to allow using the batched operations.

I am currently running the scale_run.py profiling script after these changes to check the effect on run times and will update with the profiling results when finished. Due to the issue in #385, it is not possible to currently perform runs with the Diarrhoea module registered so I have manually removed this module from the scale_run.py configuration for both the runs before and after the changes in this PR.

Comment on lines +549 to +572

for symptom in symptom_list_severe:
# Let u ~ Uniform(0, 1) and p ~ Uniform(prop_lower, prop_upper),
# then the probability of the event (u < p) is (prop_lower + prop_upper) / 2
# That is the probability of b == True in the following code snippet
# b = rng.uniform() < rng.uniform(low=prop_lower, high=prop_upper)
# and this one
# b = rng.uniform() < (prop_lower + prop_upper) / 2
# are equivalent.
persons_gaining_symptom = severe_index[
rng.uniform(size=len(severe_index))
< (
range_symp.at[symptom, "prop_lower"]
+ range_symp.at[symptom, "prop_upper"]
) / 2
]
# schedule symptom onset
self.sim.modules["SymptomManager"].change_symptom(
person_id=persons_gaining_symptom,
symptom_string=symptom,
add_or_remove="+",
disease_module=self,
duration_in_days=None,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the decision of whether a person gained a symptom or not was made based on whether a random variable drawn from Uniform(0, 1) was less than a second independent random variable drawn from Uniform(a, b) where 0 < a < b < 1 with a the lower bound for a range of a probability and b the upper bound. The marginal probability of the gaining symptom event under this two stage process is equal to (a + b) / 2 so I have switched to just drawing a single set of Uniform(0, 1) variables here. In probability the behaviour should be identical to previously, but as the sequence of random variates drawn from the pseudo-random number generator will differ the exact decisions made for a given starting seed will differ so for example the final population dataframe checksums for equivalent runs will not match before and after the changes in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @tdm32 here just to confirm if this is the logic intended originally.

@matt-graham
Copy link
Collaborator Author

matt-graham commented Nov 8, 2021

Profiling results suggest these changes have actually slowed things down 😬. Total time for a 5 year profiled run of scale_run.py before the changes in this PR on ed25ab1 took 6280s in total with 660s spent in SymptomManager.clear_symptoms and 789s spent in SymptomManager.change_symptoms. Total time for a 5 year profiled run of scale_run.py after the changes in this PR on 4fe211a took 6404s in total with 83s spent in SymptomManager.clear_symptoms and 1148s spent in SymptomManager.change_symptoms. Currently investigating what is causing this and trying to figure out a solution!

@matt-graham matt-graham marked this pull request as draft November 8, 2021 10:19
@matt-graham matt-graham marked this pull request as ready for review November 16, 2021 11:28
@matt-graham
Copy link
Collaborator Author

matt-graham commented Nov 16, 2021

I finally managed to track down what was causing the performance regression. As well as extending SymptomManager.clear_symptoms and SymptomManager.change_symptom to allow batched updates, I also changed how the indexing operations to select the relevant subsets of individuals to apply changes to / use in tests were performed, with the (it turned out) naïve impression that my changes would be faster.

Specifically I changed

person_id = df.index[df.is_alive & (df.index.isin(person_id))]

to

persons = df.loc[person_id]
person_id = persons[persons.is_alive].index

in SymptomManager.change_symptoms. I thought that first selecting only those rows corresponding to person_id by directly indexing by the list of indices and then filtering by is_alive on this smaller subset would be quicker than computing a boolean index on the whole dataframe as df.is_alive & (df.index.isin(person_id)), however this turned out to be a bad assumption with isin much quicker than I'd assumed, and the use of df.loc much slower than performing the indexing operations on df.index.

I also changed

df = self.sim.population.props
group_indices = {
    'children': df.index[df.is_alive & (df.age_years < 15)],
    'adults': df.index[df.is_alive & (df.age_years >= 15)]
}
# For each generic symptom, impose it on a random sample of persons who do not have that symptom currently:
for symp in sorted(self.module.generic_symptoms):
    do_not_have_symptom = self.module.who_not_have(symptom_string=symp)
    for group in ['children', 'adults']:
        ...
        persons_eligible_to_get_symptom = group_indices[group][
            group_indices[group].isin(do_not_have_symptom)
        ]
                ...

to

df = self.sim.population.props
group_selector = {'children': df.age_years < 15, 'adults': df.age_years >= 15}
# For each generic symptom, impose it on a random sample of persons who do not have that symptom currently:
for symp in sorted(self.module.generic_symptoms):
    do_not_have_symptom = self.module.who_not_have(symptom_string=symp)
    for group in ['children', 'adults']:
        ...
        eligible_to_get_symptom = group_selector[group] & do_not_have_symptom
        persons_eligible_to_get_symptom = df.index[eligible_to_get_symptom]
        ...

in SymptomManager.clear_symptoms. Here I thought that performing everything with boolean indices and avoiding isin would be quicker but again this turned out to be wrong.

Prior to the changes in this PR (ed25ab1) a 5 year profiled run of scale_run.py took 6280s in total with 789s spent in SymptomManager.change_symptom and ~515s spent in SymptomManager.clear_symptoms (excluding time spent in SymptomManager.change_symptom which was previously called within SymptomManager.clear_symptoms).

After the updates to revert the changes causing the slowdown detailed above (33ad2a5), a 5 year profiled run of scale_run.py took 5591s in total with 421s spent in SymptomManager.change_symptom and 83s spent in SymptomManager.clear_symptoms.

I've also added some further tests to tests/test_symptommanager.py to check using change_symptom and clear_symptoms to perform batch updates (and also to add some unit tests for methods which didn't previously have separate tests).

Copy link
Collaborator

@tbhallett tbhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks @matt-graham
Please could you also update the brief documentation we have on SymptomManager to explain the new functionality?

src/tlo/methods/measles.py Outdated Show resolved Hide resolved
src/tlo/methods/symptommanager.py Outdated Show resolved Hide resolved
@matt-graham
Copy link
Collaborator Author

Looks great to me! Thanks @matt-graham Please could you also update the brief documentation we have on SymptomManager to explain the new functionality?

Thanks @tbhallett. Yes I will update the wiki with a description of the new functionality.

Access module more cleanly in measles event
Remove redundant assertion in change_symptoms
@matt-graham
Copy link
Collaborator Author

Looks great to me! Thanks @matt-graham Please could you also update the brief documentation we have on SymptomManager to explain the new functionality?

Thanks @tbhallett. Yes I will update the wiki with a description of the new functionality.

Now added some documentation at https://github.com/UCL/TLOmodel/wiki/Symptoms-and-the-SymptomManager (also changed the name of the page to reflect its now more general purpose)

Copy link
Collaborator

@tamuri tamuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Love the overhauled BitsetHandler.

@tamuri tamuri merged commit 89e9b50 into master Nov 23, 2021
@tamuri tamuri deleted the mmg/batch-symptom-change branch November 23, 2021 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants