-
Notifications
You must be signed in to change notification settings - Fork 26
libE status fields and terminology
This is linked with with issue 525
We currently have several fields related to the current status of an evaluation such as the protected fields 'given' and 'returned' and also some allocation function specific fields such as 'given_back'.
We also have a calc_status optionally returned by a sim_f or gen_f.
Also, the branch cwp/async_build adds H fields for 'cancel' and 'kill_sent' as protected fields (they are used by the manager to send kill signals to any worker with an active cancelled evaluation).
We should consider re-organizing into a better set of options. This could potentially change fields and so could be an API breaking change.
My proposal is to express for each H row, the fields below, where field categories are separated for mutual exclusivity within category.
1 Schedule status: (currently in sep. fields 'given', 'returned' are libE fields, 'allocated', 'given_back' user fields; .... Merge to one field - name to be determined - could include..
'Null'
'generated'
'allocated' [this is alloc only currently, and probably not needed - could be dealt with locally in alloc_f]
'given_to_sim'
'returned_from_sim'
'given_back_to_gen' [this is alloc only currently, perhaps should be equiv. to given_to_sim?]
'restart' [? possible use if you want to know its a restart - eg. send to same worker]
- allocated and given_back are used in alloc functions to prevent points being processed more than once. I think that if we can assume allocated points are always sent out, then allocated could be replaced by a local variable in alloc_f. Likewise if we make given_back (or given_back_to_gen) equivalent to given which is set in core libE (libE_manager), then that could be treated in a similar way (combined with a local modifier in the alloc_f).
I am trying to propose names that avoid ambiguity (eg. given to what?). Options could be a string or named integer. A problem is whether we want to keep flexibility for user to set their own. Which could be added fields in addition to these.
2 calc_status -> sim_status? - move to an H field calc_status is currently linked to a run of sim_f/gen_f - which can include multiple H rows - but we may want a field in H that would obviously relate to one sim_id.
'Not set'
'Finished' (success)
'Failed'
'Killed' (timeout, on user condition, by manager/gen???)
Note we would need to separate out what is currently returned to gen, like the FINISHED_PERSISTENT_GEN_TAG is given as a calc_status and printed to libE_stats, but also used as an message to manager. These things would need to be decoupled, and decide what is printed to libE_stats.
Timing in libE_stats relates to a call to sim_f or gen_f (which can potentially involve multiple H rows), and prints a status (if it is set). How would we reconcile this. Note that individual tasks run by the executor also get timed, but we do not automatically have a timing uniquely related to one H row.
3 cancel_requested True of False. Due to asynchronicity - 'cancelled' could be in various combinations of states in 1 & 2. Should this have the possibility to have different options (e.g. cancel_with_kill, cancel_no_kill)
4 "kill_sent" (or could it be in 3). see https://github.com/Libensemble/libensemble/blob/cwp/async_build/libensemble/libE_manager.py Need to mark when a kill signal has been sent to a running calculation so do not send more than one. This is also complicated by potentially many-to-one mapping of H row to a sim_f. Currently the code sends a kill to a sim_f if it contains a cancelled row, Does not select sim_ids to kill within one sim_f call.
At the same time calc_status currently is a return from a calculation (sim or gen) and printed in libE_stats.txt. This could contain multipole rows of H.
One row of H is ....