Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

829 optimize testingstrategy #830

Merged
merged 6 commits into from
Dec 20, 2023
Merged

829 optimize testingstrategy #830

merged 6 commits into from
Dec 20, 2023

Conversation

dabele
Copy link
Member

@dabele dabele commented Nov 11, 2023

Changes and Information

@khoanguyen-dev @xsaschako @DavidKerkmann
These changes are not ready for review yet, at least some need to be discussed since they change the model, not just performance. I want to show the changes early as a basis for discussion.

Closes #829
Closes #357
Closes #857

1

use bitset for age groups in TestingCriteria; bitset needs a fixed size, so add a maximum number of age groups. performance gain is quite small (< 1-2%), so maybe not necessary if you feel the maximum number is not wanted.

2

switch the order of if for probabilty check and evaluating TestingCriteria when applying the test; again a very small performance gain (<5%) that also depends on the performance of the RNG and the complexity of TestingCriteria. if TestingCriteria might become more complex in the future we may want to skip this. This doesn't change the model in theory, but in practice it does since the sequence of random numbers changes.

3

use CustomIndexArray for AgeGroupGoToSchool/Work; the performance gain here is barely measureable. But the interface of CustomIndexArray is more consistent with the rest of the parameters; AgeGroup and the enums we use are valid indices for a reason, random access is fast and convenient using an index into an array.

4

use std::vector to store testingschemes per location instead of std::map; The performance gains here are very large (~20-30% of total benchmark runtime). The current implementation with map has two problems:
- lookups in map are more expensive than linear search in vectors for small numbers of elements; map is only better for quite large numbers.
- lookup in map with operator[] adds an element to the map if it doesn't exist. So the map quickly grows and has an entry for every location, making lookup even slower. This can be solved by using std::vector, but it can also be solved with a map by using map::find instead of map::operator[], so it's at least somewhat independent from the first issue, but the implementation with vector is slightly faster, at least unless you add a lot of TestingSchemes for different locations

5

only run the TestingStrategy when migrating; this is the largest change to the model, but also the largest performance gain, ~30-50% (Note that these are not fully additive with the other improvements above, if the TestingStrategy is executed less often, the runtime of the TestingStrategy itself doesn't matter as much); Persons are tested much less, persons that never leave home are never tested at all, even if there is a TestingScheme for home locations; that may be realistic, but needs to be discussed.

Another model could be to run tests only when migrating, but for the source location as well as the destination. This would give most of the performance benefits from this change with probably less impact on the model results (e.g. person leaves home to get to work, is tested at home and at work and stays home if either of them fails).

If we expect a lot of TestingSchemes for specific single locations (instead of locationtypes) it might be better to store the testing scheme in the location directly.

Merge Request - Guideline Checklist

Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.

Checks by code author

  • Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
  • New code adheres to coding guidelines
  • No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
  • Tests are added for new functionality and a local test run was successful
  • Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
  • Proper attention to licenses, especially no new third-party software with conflicting license has been added

Checks by code reviewer(s)

  • Corresponding issue(s) is/are linked and addressed
  • Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
  • Appropriate unit tests have been added, CI passes and code coverage is acceptable (did not decrease)
  • No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)

@dabele dabele self-assigned this Nov 11, 2023
@dabele
Copy link
Member Author

dabele commented Nov 11, 2023

A few general notes

  • map and set may be semantically convient in many cases, but are generally slow. unordered_set and map are much faster, but still slower than std::vector in many cases. these containers are generally only worth it for large data sets with sparse random accesses. Cutoff points need to be measured of course, but vector/array/CustomIndexArray is usually a good default choice, especially when the the data (AgeGroup, enums) can be used as an index.
  • map and unordered_map are also dangerous because operator[] adds an entry if it doesn't find it
  • order of conditions matters
  • we need to find ways to not do everything in every iteration/time step whenever possible. Even checking a single bool can be slow if it isn't in the CPU cache i.e. if it isn't used regularly.

@xsaschako xsaschako linked an issue Nov 24, 2023 that may be closed by this pull request
2 tasks
@DavidKerkmann
Copy link
Member

This together with #752 will also close #357.

@dabele dabele mentioned this pull request Dec 1, 2023
2 tasks
@dabele dabele force-pushed the 829-optimize-testingstrategy branch 2 times, most recently from baf5bfb to a71dee5 Compare December 8, 2023 16:44
@dabele dabele force-pushed the 829-optimize-testingstrategy branch from a71dee5 to 4a6073c Compare December 8, 2023 16:55
@dabele
Copy link
Member Author

dabele commented Dec 8, 2023

Cleaned up the changes and fixed the unit tests.

Did better benchmark measurements to asses the changes. Uncertainty of ~2%.

baseline (current main): 8006 ms
with change 4 (vector instead of unordered_map for testing schemes): 6088 ms (~25%)
with change 4 and 2 (switch of ifs in testing strategy): 5640 ms (additional ~8 %)
with change 4, 2 and 3 (CustomIndexArray for GoToWork/GoToSchool paramater): 5638 ms (no measureable change)
with change 4, 2, 3 and 1 (bitset for agegroups in TestingCriteria with fixed number of age groups): 5262 ms (additional ~7%)

From our discussions changes 2,3,4 should be fine to add. Change 3 does not give performance, just cleaner code (I think). Change 1 adds some complexity to the model, we can discuss whether the performance gain is worth it. Since I cleaned up and ordered the commits, it can be easily reverted.

Reverted change 5 (only run TestingStrategy when migrating) as discussed. But I measured the performance for this change as well. Only testing persons that migrate to a different location gives 3889 ms (~27%). This measurement includes all the other changes above, I did not measure the change independently. So we probably should keep this in the discussion but in a separate Issue/MR.

@dabele dabele marked this pull request as ready for review December 8, 2023 17:11
@dabele dabele requested a review from DavidKerkmann December 8, 2023 17:11
Copy link

codecov bot commented Dec 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8ed8364) 95.45% compared to head (60936cf) 95.46%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #830   +/-   ##
=======================================
  Coverage   95.45%   95.46%           
=======================================
  Files         118      118           
  Lines        9269     9289   +20     
=======================================
+ Hits         8848     8868   +20     
  Misses        421      421           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@DavidKerkmann DavidKerkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements. I agree that we should track point 5 for later inclusion as this greatly increases performance and is important. For now, I only have one remark.

cpp/models/abm/parameters.h Show resolved Hide resolved
@dabele dabele mentioned this pull request Dec 15, 2023
11 tasks
@mknaranja mknaranja merged commit 08c1753 into main Dec 20, 2023
55 checks passed
@mknaranja mknaranja deleted the 829-optimize-testingstrategy branch December 20, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

segfault in ABM with OpenMP Further optimization of TestingStrategy Test container for TestingCriteria
3 participants