Meeting December 2019 (Programming)

Guidance

Lots of notes now on the wiki!
Follow the process outlined in 'implementing a disease module'
- Have a basic test file to run your code
- Develop the model incrementally, adding complexity gradually
  - Catch problems early
  - Easier for us to help!
Master should be merged into your branches soon after PRs are merged
- Notification in the Slack #programming channel
- Prevents complicated conflicts
- Keeps your branch up-to-date
Open draft PRs on Github
- Can use the collaboration tools but indicates work-in-progress
In Pycharm, set the working directory for to always be the root 'TLOmodel' directory
- Paths can be relative to this

New utility functions and requests for me

tlo.util.transition_states
- Takes a single Dataframe column of states and transition probability matrix (Dataframe)
- Returns a new column with transitioned states
- Example on the wiki

tlo.util.nested_to_record

A flattened dictionary representation of a Dataframe
e.g.

          Name   Region     Username
  1  Nathaniel  Midwest      nzburke
  2  Elisabeth    South     ewfoster
  3     Briana  Midwest  bclancaster
  4    Estella     West     elpotter
  5     Lamont    South      llwoods

becomes

 {'First Name_1': 'Nathaniel',
  'First Name_2': 'Elisabeth',
  'First Name_3': 'Briana',
  'First Name_4': 'Estella',
  'First Name_5': 'Lamont',
  'Region_1': 'Midwest',
  'Region_2': 'South',
  'Region_3': 'Midwest',
  'Region_4': 'West',
  'Region_5': 'South',
  'User Name_1': 'nzburke',
  'User Name_2': 'ewfoster',
  'User Name_3': 'bclancaster',
  'User Name_4': 'elpotter',
  'User Name_5': 'llwoods'}

Can be used for logging

In disease modules (class Xyz(Module)), self.load_parameters_from_dataframe loads parameters from resource dataframe, updating the class PARAMETERs

Updates to wiki and PR checklists

Installation and setup guide
- Still need a Windows version!
Phase 4 & 5 from the checklist for developing a disease module
Pre-PR checklist
- We're figuring out the tooling on Windows!

Issues outstanding

Improve logging using a TLO-specific logging module
- Handles setting up the logging of TLO
- One-liners to configure output
  - e.g. turn off, save to file etc.
- Deal with strange output that causes problems downstream e.g. nan
- Enforce documentation of log lines
```
LOGGING = { 
	'population_by_sex': LogLine('Population alive by sex'),
	'cause_of_death': LogLine('Deaths in the last month grouped by caused') 
}
```
- (TBH: A flag to indicate whether or not this output should be subject to scaling to match whole population size)
- Includes improving the parsing of logs
  - Filtering the log lines when parsing
  - Using a faster implementation of building dataframes from log lines
Performance
- Health System is the bottleneck
  - Continuing to profile and refactor
- Over-allocating rows in the population dataframe
  - Essential that models only work on is_alive individuals!
- The more frequent an event, the more to worry about the "work" in each call
- We want to add a set of tools to easily profile blocks of code (using e.g. decorator)
More robust testing
- CI to run tests on small and large population sizes
- checks on the use of is_alive

Run Management system proposal

To ease configuring and running simulations, and processing of output
Prepare to run on compute clusters
A command-line tool to manage this: tlo

Have a collection of templates (or one that can be configured) that describe a "scenario"

e.g. tlo create-scenario my_test --template basic_scenario --some --other --options
would create a directory and write a scenario file therein

 # -------------------------------------------------------------
 # Name: my_test
 # Created: 10/12/2019 12:45
 # Template: basic_scenario
 # -------------------------------------------------------------

 import time
 import tlo.logging
 from tlo import Date, Simulation
 from tlo.methods.demography import Demography
 from tlo.methods.contraception import Contraception

 # -------------------------------------------------------------
 # Basic configuration
 # -------------------------------------------------------------
 start_date = Date(2010, 1, 1)
 end_date = Date(2051, 1, 1)
 initial_population_size = 100000
 resourcefilepath = './resources/'

 tlo.logging.configure()
 
 simulation = Simulation(start_date=start_date)
 
 # -------------------------------------------------------------
 # Register modules
 # -------------------------------------------------------------
 simulation.register(Demography(resourcefilepath=resourcefilepath))
 simulation.register(Contraception(resourcefilepath=resourcefilepath))

 # Uncomment both import and register lines below
 # from tlo.methods.enhanced_lifestyle import Lifestyle
 # simulation.register(Lifestyle(resourcefilepath=resourcefilepath))

 from tlo.methods.depression import Depression
 simulation.register(Depression(resourcefilepath=resourcefilepath))

 # from tlo.methods.epilepsy import Epilepsy
 # simulation.register(Epilepsy(resourcefilepath=resourcefilepath))
 	
 # -------------------------------------------------------------
 # Override parameters
 # -------------------------------------------------------------
 simulation.override_parameters(
     {
         Demography: {
             'fraction_of_births_male': 0.2
         },
         Depression: {
 			'init_rp_ever_depr_per_year_older_f': 0.125,
 			'prob_3m_selfharm_depr': lambda rng: rng.rand(),
 			'rr_depr_on_antidepr': lambda rng: rng.exponential(0.1)
         },
     }
 )
 
 # -------------------------------------------------------------
 # Run simulation
 # -------------------------------------------------------------
 simulation.seed_rngs(int(time.time())
 simulation.make_initial_population(n=initial_population_size)
 simulation.simulate(end_date=end_date)

We then create a sample from our scenario

e.g. tlo create-sample my_test --some --other --options
Takes above scenario and samples value where necessary (placed in a sub-directory)

  simulation.override_parameters(
      {
          Demography: {
              'fraction_of_births_male': 0.2
          },
          Depression: {
  			'init_rp_ever_depr_per_year_older_f': 0.125,
  			'prob_3m_selfharm_depr': 0.5187848579652606,
  			'rr_depr_on_antidepr': 0.05841701302920538
          },
      }
  )

Can create several samples
- tlo create-sample my_test --count 100 would create 100 samples of the scenario file

Finally we run the sample as many times as we would like
- tlo run-sample my_test --all - runs all the samples
- tlo run-sample my_test --sample 15 - runs a specific sample
- tlo run-sample my_test --all --runs 1000 - run all the samples, each 1000 times

The resulting set of files might look something like this:

 scenarios
 ├── fixed_antidepr
 ├── my_test
 │   ├── scenario.py
 │   ├── sample_001
 │   │   ├── sample.py
 │   │   ├── run_0001
 │   │   │   ├── output.csv
 │   │   │   ├── output.log
 │   │   │   └── output.pickle
 │   │   ├── run_0002
 │   │   ├── ...
 │   │   └── run_1000
 │   ├── sample_002
 │   ├── ...
 │   └── sample_100
 ├── random_selfharm
 └── some_scenario

Could also generate script required to submit jobs on a computer cluster
- tlo run-sample my_test --sample 1 --runs 1000 --job-array
- creates a shell script to submit job array to cluster

TLO Model Wiki

Provide feedback

Saved searches