diff --git a/doc/source/user_documentation.rst b/doc/source/user_documentation.rst index 8843f4c64..d46896a0f 100644 --- a/doc/source/user_documentation.rst +++ b/doc/source/user_documentation.rst @@ -2547,3 +2547,281 @@ The code for neural network optimization through evolutionary algorithm used in .. Adding cell classes .. -------------------- +Running a Batch Job (Beta) +=================== + +The NetPyNE batchtools subpackage provides a method of automating job submission and reporting:: + + + batch<-->\ /---> configuration_0 >---\ + \ / specs---\ + \<--->dispatcher_0 sim_0 + \ \ comm ---/ + \ \---< results_0 <---/ + \ + \ /---> configuration_1 >---\ + \ / specs---\ + \<--->dispatcher_1 sim_1 + \ \ comm ---/ + \ \---< results_1 <---/ + \ + \ + ... + + + +1. Setting up batchtools +----- +Beyond the necessary dependency installations for NetPyNE and NEURON, several additional `pip` installations are required. + +The NetPyNE installation should be handled as a development installation of the repository branch `batch`:: + + git clone https://github.com/Neurosim-lab/netpyne.git + cd netpyne + git checkout batch + pip install -e . + +The batchtools installation either:: + + pip install -u batchtk + +or a development install (recommended):: + + git clone https://github.com/jchen6727/batchtk.git + cd batchtk + pip install -e . + +Ray is a dependency for batchtools, and should be installed with the following command:: + + pip install -u ray[default] + +2. Examples +----- +Examples of NetPyNE batchtools usage can be found in the ``examples`` directory `here `_. + +Examples of the underlying batchtk package can be in the ``examples`` directory `here `_. + +3. Retrieving batch configuration values through the ``specs`` object +----- +Each simulation is able to retrieve relevant configurations through the ``specs`` object, and communicate with +the dispatcher through the ``comm`` object. + +importing the relevant objects:: + + from netpyne.batchtools import specs, comm + cfg = specs.SimConfig() # create a SimConfig object + netParams = specs.NetParams() # create a netParams object + +``netpyne.batchtools.specs`` behaves similarly to ``netpyne.sim.specs`` except in the following cases: + +* ``netpyne.batchtools.specs`` automatically captures relevant configuration mappings created by the ``dispatcher`` upon initialization + + * these mappings can be retrieved via ``specs.get_mappings()`` + +* the SimConfig object created by ``netpyne.batch.specs.SimConfig()`` will update itself with relevant configuration mappings through the ``update()`` method:: + + from netpyne.batchtools import specs # import the custom batch specs + cfg = specs.SimConfig() # create a SimConfig object + cfg.update() # update the cfg object with any relevant mappings for this particular batch job + +The ``update`` method will update the ``SimConfig`` object with the configuration mappings captured in ``specs`` (see: ``specs.get_mappings()``) + +This replaces the previous idiom for updating the SimConfig object with mappings from the batched job submission:: + + try: + from __main__ import cfg # import SimConfig object with params from parent module + except: + from cfg import cfg # if no simConfig in parent module, import directly from tut8_cfg module + + + +4. Communicating results to the ``dispatcher`` with the ``comm`` object +----- + +Prior batched simulations relied on ``.pkl`` files to communicate data. The ``netpyne.batch`` subpackage uses a specific ``comm`` object to send custom data back +The ``comm`` object determines the method of communication based on the batch job submission type. + +In terms of the simulation, the following functions are available to the user: + +* **comm.initialize()**: establishes a connection with the batch ``dispatcher`` for sending data + +* **comm.send()**: sends ```` to the batch ``dispatcher`` + * for ``search`` jobs, it is important to match the data sent with the metric specified in the search function + +* **comm.close()**: closes and cleans up the connection with the batch ``dispatcher`` + +5. Specifying a batch job +----- +Batch job handling is implemented with methods from ``netpyne.batchtools.search`` + +**search**:: + + def search(job_type: str, # the submission engine to run a single simulation (e.g. 'sge', 'sh') + comm_type: str, # the method of communication between host dispatcher and the simulation (e.g. 'socket', 'filesystem') + run_config: Dict, # batch configuration, (keyword: string pairs to customize the submit template) + params: Dict, # search space (dictionary of parameter keys: tune search spaces) + algorithm: Optional[str] = "variant_generator", # search algorithm to use, see SEARCH_ALG_IMPORT for available options + label: Optional[str] = 'search', # label for the search + output_path: Optional[str] = '../batch', # directory for storing generated files + checkpoint_path: Optional[str] = '../ray', # directory for storing checkpoint files + max_concurrent: Optional[int] = 1, # number of concurrent trials to run at one time + batch: Optional[bool] = True, # whether concurrent trials should run synchronously or asynchronously + num_samples: Optional[int] = 1, # number of trials to run + metric: Optional[str] = "loss", # metric to optimize (this should match some key: value pair in the returned data + mode: Optional[str] = "min", # either 'min' or 'max' (whether to minimize or maximize the metric + algorithm_config: Optional[dict] = None, # additional configuration for the search algorithm + ) -> tune.ResultGrid: # results of the search + +The basic search implemented with the ``search`` function uses ``ray.tune`` as the search algorithm backend, returning a ``tune.ResultGrid`` which can be used to evaluate the search space and results. It takes the following parameters; + +* **job_type**: either "``sge``" or "``sh``", specifying how the job should be submitted, "``sge``" will submit batch jobs through the Sun Grid Engine. "``sh``" will submit bach jobs through the shell on a local machine +* **comm_type**: either "``socket``" or "``filesystem``", specifying how the job should communicate with the dispatcher +* **run_config**: a dictionary of keyword: string pairs to customize the submit template, the expected keyword: string pairs are dependent on the job_type:: + + ======= + sge + ======= + queue: the queue to submit the job to (#$ -q {queue}) + cores: the number of cores to request for the job (#$ -pe smp {cores}) + vmem: the amount of memory to request for the job (#$ -l h_vmem={vmem}) + realtime: the amount of time to request for the job (#$ -l h_rt={realtime}) + command: the command to run for the job + + example: + run_config = { + 'queue': 'cpu.q', # request job to be run on the 'cpu.q' queue + 'cores': 8, # request 8 cores for the job + 'vmem': '8G', # request 8GB of memory for the job + 'realtime': '24:00:00', # set timeout of the job to 24 hours + 'command': 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py' + } # set the command to be run to 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py' + + ======= + sh + ======= + command: the command to run for the job + + example: + run_config = { + 'command': 'mpiexec -n 8 nrniv -python -mpi init.py' + } # set the command to be run + +* **params**: a dictionary of config values to perform the search over. The keys of the dictionary should match the keys of the config object to be updated. Lists or numpy generators >2 values will force a grid search over the values; otherwise, a list of two values will create a uniform distribution sample space. + + **usage 1**: updating a constant value specified in the ``SimConfig`` object :: + + # take a config object with the following parameter ``foo`` + cfg = specs.SimConfig() + cfg.foo = 0 + cfg.update() + + # specify a search space for ``foo`` such that a simulation will run with: + # cfg.foo = 0 + # cfg.foo = 1 + # cfg.foo = 2 + # ... + # cfg.foo = 9 + + # using: + params = { + 'foo': range(10) + } + + **usage 2**: updating a nested object in the ``SimConfig`` object:: + + # to update a nested object, the package uses the `.` operator to specify reflection into the object. + # take a config object with the following parameter object ``foo`` + cfg = specs.SimConfig() + cfg.foo = {'bar': 0, 'baz': 0} + cfg.update() + + # specify a search space for ``foo['bar']`` with `foo.bar` such that a simulation will run: + # cfg.foo['bar'] = 0 + # cfg.foo['bar'] = 1 + # cfg.foo['bar'] = 2 + # ... + # cfg.foo['bar'] = 9 + + # using: + params = { + 'foo.bar': range(10) + } + + # this reflection works with nested objects as well... + # i.e. + # cfg.foo = {'bar': {'baz': 0}} + # params = {'foo.bar.baz': range(10)} + +* **algorithm** : the search algorithm (supported within ``ray.tune``) + + **Supported algorithms**:: + + * "variant_generator": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "random": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "axe": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "bayesopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "hyperopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "bohb": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "nevergrad": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "optuna": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "hebo": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "sigopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + * "zoopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html) + +* **label**: a label for the search, used for output file naming + +* **output_path**: the directory for storing generated files, can be a relative or absolute path + +* **checkpoint_path**: the directory for storing checkpoint files in case the search needs to be restored, can be a relative or absolute path + +* **max_concurrent**: the number of concurrent trials to run at one time, it is recommended to keep in mind the resource usage of each trial to avoid overscheduling + +* **batch**: whether concurrent trials should run synchronously or asynchronously + +* **num_samples**: the number of trials to run, for any grid search, each value in the grid will be sampled ``num_samples`` times. + +* **metric**: the metric to optimize (this should match some key: value pair in the returned data) + +* **mode**: either 'min' or 'max' (whether to minimize or maximize the metric) + +* **algorithm_config**: additional configuration for the search algorithm (see the `optuna docs `_) + +6. Performing parameter optimization searches (CA3 example) +----- +The ``examples`` directory `here `_ shows both a ``grid`` based search as well as an ``optuna`` based optimization. + +In the ``CA3`` example, we tune the ``PYR->BC`` ``NMDA`` and ``AMPA`` synaptic weights, as well as the ``BC->PYR`` ``GABA`` synaptic weight. Note the search space is defined:: + + # from optuna_search.py + params = {'nmda.PYR->BC' : [1e-3, 1.8e-3], + 'ampa.PYR->BC' : [0.2e-3, 0.5e-3], + 'gaba.BC->PYR' : [0.4e-3, 1.0e-3], + } + +in both ``optuna_search.py``, defining the upper and lower bounds of the search space, while in ``grid_search.py`` the search space is defined:: + + # from grid_search.py + params = {'nmda.PYR->BC' : numpy.linspace(1e-3, 1.8e-3, 3), + 'ampa.PYR->BC' : numpy.linspace(0.2e-3, 0.5e-3, 3), + 'gaba.BC->PYR' : numpy.linspace(0.4e-3, 1.0e-3, 3), + } + +which defines ``3x3x3`` specific values to search over + +Note that the ``metric`` specifies a specific ``string`` (``loss``) to report and optimize around. This value is generated and ``sent`` by the ``init.py`` simulation:: + + # from init.py + results['PYR_loss'] = (results['PYR'] - 3.33875)**2 + results['BC_loss'] = (results['BC'] - 19.725 )**2 + results['OLM_loss'] = (results['OLM'] - 3.470 )**2 + results['loss'] = (results['PYR_loss'] + results['BC_loss'] + results['OLM_loss']) / 3 + out_json = json.dumps({**inputs, **results}) + + print(out_json) + #TODO put all of this in a single function. + comm.send(out_json) + comm.close() + +The ``out_json`` output contains a dictionary which includes the ``loss`` metric (calculated as the MSE between observed and expected values) + +In a multi-objective optimization, the relevant ``PYR_loss``, ``BC_loss``, and ``OLM_loss`` components are additionally included (see ``mo_optuna_search.py``) \ No newline at end of file