-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut calculation time by half with population #862
Comments
Some experimentation on this issue, going more in depth into an idea initially introduced by @Morendil. TLDR: There seems to be a way to speed up the calculations by 33% to 50% in common usecases IntroductionOpenFisca (OF) runs calculations vectorially. Concretely, when given a large population, OF never loops on the individuals, but run each operation on the global population. The experiment presented here tries to improve performances by reducing these useless calculations by introducing some kind of conditional branching, while respecting the vectorial paradigm. The idea is, when facing a condition, to "split and combine" (SC):
Experiment 1 (symmetric case disjunction)The code is available on this gist.
Result
|
Experiment 2 (eligibility)Similar to the first experiment, expect that this time only handicapped individuals can get a benefit. This creates a dissymmetry: one case is much less complex than the other one. Result
|
Interpretation
ConclusionI think the approach is really promising: it doesn't take a lot of operations to make SC more efficient than our current implementation, especially when calculations are relevant to only a small fraction of the population. |
Implementation challengesI think the main big challenges would be to adapt the cache to the fact that we would now only "partially" calculate some variables. There will also be some plumbing to do, but the way formulas are written is actually helping a lot: because they take "persons" array as a first argument, this persons can be an object that represent a subset of the simulation's persons without too much adaptation. |
The "depth" factor would strike me as an important one to study too, understood as how many vector computations are chained that depend on one another. The benefits of SC will multiply in the common cases but that is a lower bound; given small enough frequencies, or (and that is likely common) mutually exclusive conditions at various depths, SC will eliminate altogether some branches of the computation tree. |
The initial need of <15s for LexImpact has been achieved since. Closing for now, do not hesitate for reopen in the future. Thanks for all the work! ✨ |
LexImpact has to run two calculations on bulk data (50 000 households) in less than 15 seconds.
Definition of done:
Explore in 2 days max
The text was updated successfully, but these errors were encountered: