-
Notifications
You must be signed in to change notification settings - Fork 7
Developer onboarding guide
For setting up development environment via a terminal see the instructions on TLOmodel
documentation site (you only really need Miniconda rather than a full Anaconda distribution). There are also more detailed installation guide in the TLOmodel
repository wiki aimed at setting up a development environment with PyCharm.
- Overview of framework - Word document. Top-level overview of how model is organized.
- Tutorial introduction to individual-based modelled - video, slides. Overview of modelling approach.
- How to do a model analysis - wiki page. Specifically the first part giving diagrammatic overview of the model structure and how the different parts related to each other.
- Coding conventions wiki page. Overview of some of conventions in terms of file organization and naming used with model.
The main directories of interest in repository are
-
.github/workflows
- contains GitHub Actions YAML files defining workflows for continuous integration (run on all pushes to pull-request branches, merges in tomaster
branch, and on a nightlycron
schedule) and workflow files managing additional comment triggered workflows. -
docs
- static reStructuredText files used by Sphinx for building HTML documentation hosted at tlomodel.org, some associated Python scripts for generating some of the documentation, and write ups of each of the module modules asdocx
files in thewriteups
subdirectory. -
outputs
- default directory used for any outputs produced by simulations. -
requirements
- input files used bypip-compile
to produce pinned requirements files and corresponding generated requirements files.base
refers to dependencies for installing and running model,dev
for additional dependencies needed to also run tests, usetox
for automation and build requirements files. -
resources
- files (largely Excel.xlsx
spreadsheets or comma separated variable files), containing data used to set default parameters for models and used in calibration of model. Git LFS is primarily used with the files in this directory - if you get an error indicating a file in theresources
directory cannot be found, most likely you only have the pointer file present and need to set up Git LFS and rungit lfs fetch --all
. -
src/scripts
- scripts written by modellers / users for running analyses with model. These tend to be mainly used by the individual researchers who wrote them, and we don't have any testing to ensure these stay up to date with changes in the code. -
src/tlo
- the top-level directory for thetlo
Python package defining the modelling framework and model components. -
src/tlo/methods
- directory containing the individual modules of the model (in TLO terms a self-contained component of model, often but not exclusively associated with a particular disease, rather than the usual Python meaning of module). Most of these modules have been primarily developed by one or more of modelling team. -
tests
- thepytest
test modules defining the test functions. For the most part there is a one-to-one mapping from test modules to Python modules in thesrc/tlo
package and subpackages, though some test modules aren't primarily associated with any one module (for exampletest_determinism.py
andtest_maternal_health_helper_and_analysis_functions.py
).
We use tox
for automating some of the common tasks we perform with the TLOmodel
code. The tox.ini
file in the root of the repository defines various 'environments' - each of these specifies both a set of dependencies and commands to run. Running tox -e {environment_name}
will create a clean virtual environment, install the dependencies for {environment_name}
and then run the associated commands. Some of the more useful environments we have set up are
-
py38-pandas12
- run tests with Python v3.8 and Pandas v1.2 (current versions we support). -
py311-pandas20
- run tests with Python v3.11 and Pandas v2.0 (versions we are hopefully going to move to soon). -
docs
- build documentation using Sphinx. -
check
- run checks on source code (flake8
,isort
,check-manifest
) - this is run as part of GitHub Actions CI so it's useful to run this locally before pushing to catch any errors, particularly with import ordering. -
profile
- run thesrc/scripts/profiling/scale_run.py
with arguments specifying to simulate 5 years and show a progress bar, usingpyinstrument
to get profiling data.
We typically use the scale_run.py
script in src/scripts/profiling
as the target for profiling runs, with this (by default) performing a run of the full model with an initial population size and total simulation time currently judged to be reflective of what we would want to use in model analysis runs.
Some of the behaviour of the model (for example availability of resources) is appropriately scaled by a factor controlled by the ratio of the simulated initial population size and real initial population size, as computed in the Demography.compute_initial_model_to_data_popsize_ratio
method, therefore runs with smaller population sizes can still be usefully interpreted. However, generally a larger initial population size will be expected to make the model better reflect the population being simulated. The default initial population size used in scale_run.py
is therefore based on trading off model fidelity with ensuring runs can be completed in a reasonable time (as a ball park, roughly 24 hours or less for a full run), and has been increased over time as the model has been made more performant.
Doing profiled runs of the model helps to identify where the key bottlenecks are, with the profiling output recording how much time is being spent in different parts of the call graph. Different profiling tools uses different approaches to gathering this information. The results of profiling the scale_run.py
script over time along with some analysis of identified bottlenecks are currently tracked in a Git issue.
Deterministic profilers like the built-in profile
and cProfile
modules trace all function calls; this gives a high degree of granularity to the recorded profiling statistics, but with a tradeoff that the overheads arising from the recording of statistics by the profiler can distort the results, particularly for small functions which are run many times for which the overhead will have more of an effect.
Statistical profilers such as pyinstrument instead record where the program is in the call stack at some regular interval (by default 1ms for pyinstrument
). This significantly reduces the overhead associated with profiling, reducing the bias in the results compared to non-profiled runs, at the cost of introducing some variance. Compared to the built-in profile
cProfile
modules, pyinstrument
also has the advantage of recording the full call stack rather than just the specific function being called: this is particularly useful in helping to identify where functions are being called from, with there many functions in TLOmodel
which are called from multiple different parts of the code.
The default behaviour of the built-in cProfile
and profile
modules is to output the profile results to stdout
as a table with columns showing the number of calls, total time spent in the function (excluding calls to sub-functions), cumulative time spent in function (including calls to sub-functions) and filename plus line number reference for function, with one row per called function. For complex codebases like TLOmodel
which have very large call graphs this output is not always that interpretable. Alternatively the profiling data can be outputed to a file using the -o {filename}
option for example
python -m cProfile -o scale_run.prof src/scripts/profiling/scale_run.py
This can then be visualised using other applications. For example, SnakeViz allows viewing the profiling results in a browser as a interactive 'icicle' or 'sunburst' visualization, that represents the time spent in different functions in a more visual form. Pyinstrument also allows outputing to the same pstats
output format used by cProfile
using the option -r pstats
, for example
pyinstrument -i 0.01 -r pstats -o scale_run.prof src/scripts/profiling/scale_run.py
where -i 0.01
sets the sampling interval to 0.01 seconds. Pyinstrument also has several other useful output formats including the default behaviour rendering as a labelled text-based call graph (-r text
, rendering to an interactive HTML file (-r html
) and rendering to a file that can be used with speedscope (-r speedscope
).
TLO Model Wiki