829 optimize testingstrategy #830

dabele · 2023-11-11T17:41:26Z

Changes and Information

@khoanguyen-dev @xsaschako @DavidKerkmann
These changes are not ready for review yet, at least some need to be discussed since they change the model, not just performance. I want to show the changes early as a basis for discussion.

Closes #829
Closes #357
Closes #857

1

use bitset for age groups in TestingCriteria; bitset needs a fixed size, so add a maximum number of age groups. performance gain is quite small (< 1-2%), so maybe not necessary if you feel the maximum number is not wanted.

2

switch the order of if for probabilty check and evaluating TestingCriteria when applying the test; again a very small performance gain (<5%) that also depends on the performance of the RNG and the complexity of TestingCriteria. if TestingCriteria might become more complex in the future we may want to skip this. This doesn't change the model in theory, but in practice it does since the sequence of random numbers changes.

3

use CustomIndexArray for AgeGroupGoToSchool/Work; the performance gain here is barely measureable. But the interface of CustomIndexArray is more consistent with the rest of the parameters; AgeGroup and the enums we use are valid indices for a reason, random access is fast and convenient using an index into an array.

4

use std::vector to store testingschemes per location instead of std::map; The performance gains here are very large (~20-30% of total benchmark runtime). The current implementation with map has two problems:
- lookups in map are more expensive than linear search in vectors for small numbers of elements; map is only better for quite large numbers.
- lookup in map with operator[] adds an element to the map if it doesn't exist. So the map quickly grows and has an entry for every location, making lookup even slower. This can be solved by using std::vector, but it can also be solved with a map by using map::find instead of map::operator[], so it's at least somewhat independent from the first issue, but the implementation with vector is slightly faster, at least unless you add a lot of TestingSchemes for different locations

5

only run the TestingStrategy when migrating; this is the largest change to the model, but also the largest performance gain, ~30-50% (Note that these are not fully additive with the other improvements above, if the TestingStrategy is executed less often, the runtime of the TestingStrategy itself doesn't matter as much); Persons are tested much less, persons that never leave home are never tested at all, even if there is a TestingScheme for home locations; that may be realistic, but needs to be discussed.

Another model could be to run tests only when migrating, but for the source location as well as the destination. This would give most of the performance benefits from this change with probably less impact on the model results (e.g. person leaves home to get to work, is tested at home and at work and stays home if either of them fails).

If we expect a lot of TestingSchemes for specific single locations (instead of locationtypes) it might be better to store the testing scheme in the location directly.

Merge Request - Guideline Checklist

Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.

Checks by code author

Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
New code adheres to coding guidelines
No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
Tests are added for new functionality and a local test run was successful
Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
Proper attention to licenses, especially no new third-party software with conflicting license has been added

Checks by code reviewer(s)

Corresponding issue(s) is/are linked and addressed
Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
Appropriate unit tests have been added, CI passes and code coverage is acceptable (did not decrease)
No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)

dabele · 2023-11-11T17:52:01Z

A few general notes

map and set may be semantically convient in many cases, but are generally slow. unordered_set and map are much faster, but still slower than std::vector in many cases. these containers are generally only worth it for large data sets with sparse random accesses. Cutoff points need to be measured of course, but vector/array/CustomIndexArray is usually a good default choice, especially when the the data (AgeGroup, enums) can be used as an index.
map and unordered_map are also dangerous because operator[] adds an entry if it doesn't find it
order of conditions matters
we need to find ways to not do everything in every iteration/time step whenever possible. Even checking a single bool can be slow if it isn't in the CPU cache i.e. if it isn't used regularly.

DavidKerkmann · 2023-11-24T09:47:52Z

This together with #752 will also close #357.

check num age groups when creating the world

dabele · 2023-12-08T17:11:17Z

Cleaned up the changes and fixed the unit tests.

Did better benchmark measurements to asses the changes. Uncertainty of ~2%.

baseline (current main): 8006 ms
with change 4 (vector instead of unordered_map for testing schemes): 6088 ms (~25%)
with change 4 and 2 (switch of ifs in testing strategy): 5640 ms (additional ~8 %)
with change 4, 2 and 3 (CustomIndexArray for GoToWork/GoToSchool paramater): 5638 ms (no measureable change)
with change 4, 2, 3 and 1 (bitset for agegroups in TestingCriteria with fixed number of age groups): 5262 ms (additional ~7%)

From our discussions changes 2,3,4 should be fine to add. Change 3 does not give performance, just cleaner code (I think). Change 1 adds some complexity to the model, we can discuss whether the performance gain is worth it. Since I cleaned up and ordered the commits, it can be easily reverted.

Reverted change 5 (only run TestingStrategy when migrating) as discussed. But I measured the performance for this change as well. Only testing persons that migrate to a different location gives 3889 ms (~27%). This measurement includes all the other changes above, I did not measure the change independently. So we probably should keep this in the discussion but in a separate Issue/MR.

codecov · 2023-12-08T17:51:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8ed8364) 95.45% compared to head (60936cf) 95.46%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #830   +/-   ##
=======================================
  Coverage   95.45%   95.46%           
=======================================
  Files         118      118           
  Lines        9269     9289   +20     
=======================================
+ Hits         8848     8868   +20     
  Misses        421      421

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DavidKerkmann

Thanks for the improvements. I agree that we should track point 5 for later inclusion as this greatly increases performance and is important. For now, I only have one remark.

cpp/models/abm/parameters.h

dabele self-assigned this Nov 11, 2023

xsaschako linked an issue Nov 24, 2023 that may be closed by this pull request

Further optimization of TestingStrategy #829

Closed

2 tasks

dabele mentioned this pull request Dec 1, 2023

segfault in ABM with OpenMP #857

Closed

2 tasks

dabele force-pushed the 829-optimize-testingstrategy branch 2 times, most recently from baf5bfb to a71dee5 Compare December 8, 2023 16:44

dabele added 4 commits December 8, 2023 17:55

perf: vector of testing schemes instead of map

438021f

perf: switch order of ifs

d420e30

customindexarray of school/work age groups

38d609c

perf: bitset of age groups

4a6073c

check num age groups when creating the world

dabele force-pushed the 829-optimize-testingstrategy branch from a71dee5 to 4a6073c Compare December 8, 2023 16:55

dabele marked this pull request as ready for review December 8, 2023 17:11

dabele requested a review from DavidKerkmann December 8, 2023 17:11

fix examples for different GoToWork parameter type

d87426f

DavidKerkmann reviewed Dec 11, 2023

View reviewed changes

cpp/models/abm/parameters.h Show resolved Hide resolved

dabele mentioned this pull request Dec 15, 2023

Run unit tests with OpenMP in CI #870

Merged

11 tasks

DavidKerkmann approved these changes Dec 20, 2023

View reviewed changes

Merge branch main into 829-optimize-testingstrategy

60936cf

mknaranja approved these changes Dec 20, 2023

View reviewed changes

mknaranja merged commit 08c1753 into main Dec 20, 2023
55 checks passed

mknaranja deleted the 829-optimize-testingstrategy branch December 20, 2023 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

829 optimize testingstrategy #830

829 optimize testingstrategy #830

dabele commented Nov 11, 2023 •

edited

Loading

dabele commented Nov 11, 2023

DavidKerkmann commented Nov 24, 2023

dabele commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading

DavidKerkmann left a comment

829 optimize testingstrategy #830

829 optimize testingstrategy #830

Conversation

dabele commented Nov 11, 2023 • edited Loading

Changes and Information

1

2

3

4

5

Merge Request - Guideline Checklist

Checks by code author

Checks by code reviewer(s)

dabele commented Nov 11, 2023

DavidKerkmann commented Nov 24, 2023

dabele commented Dec 8, 2023 • edited Loading

codecov bot commented Dec 8, 2023 • edited Loading

Codecov Report

DavidKerkmann left a comment

Choose a reason for hiding this comment

dabele commented Nov 11, 2023 •

edited

Loading

dabele commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading