Add start_time and stop_time to failure log #547

PGijsbers · 2023-06-21T14:14:25Z

This can be useful to see how long instances were available before errors occurred (identify e.g., start up failures).

since stop_time is not generally available

This helps more quickly identify at what stage the failure took place. E.g., if it's just a few minutes in, it is probably setup failure (such as connectivity issues).

* Update AutoGluon `max_memory` from 0.1 to 0.4 in persist_models (#543) * Add `optimize_for_deployment` for AutoGluon_hq (#544) * Reduce training time by 10% if a high_quality preset is used (#546) * Reduce training time by 10% if a high_quality preset is used High quality presets perform a post-fit step which takes 10~15% of total time (by Nick's estimate). To ensure comparisons stay reasonably fair we pre-emptively tell AutoGluon to use less time, so that all frameworks' models are based on "max_total_time" amount of effort. * Allow preset to be str or list and still reduce if hq or gq * Add identical markers to identify fit/inferencetime/predict stages (#548) * Add start_time, stop_time and log_time to failure.csv (#547) This helps more quickly identify at what stage the failure took place. E.g., if it's just a few minutes in, it is probably setup failure (such as connectivity issues). * Docker/permissions (#550) * Remove ownership changing and starting as user for docker images Since the USER is overwritten by `-u` for non-Windows platforms, which creates issues when the account running the docker image is not the same as the one that created it. * Dont run docker as root since images no longer have associated user * Ignore some additional files not needed to run the benchmark * Create root dir if it does not exist This is required, because otherwise in docker mode a non-existent directory is mounted, which is by default locked to `root` permissions. This in turn makes the benchmark app unable to create the subdirectories when the image is run as user. * Further remove user info from docker build and add run_as option The run_as option is then configurable so that it can be enabled for people who run into issues. Unfortunately, I observed different behavior from two systems with the same OS and docker versions installed. So for now I give up on one unified solution. * Update GAMA for v23.0.0 (#551) --------- Co-authored-by: Nick Erickson <innixma@gmail.com>

Add start_time and stop_time to failure log

a8c4511

PGijsbers added the aws AWS support label Jun 21, 2023

PGijsbers added 2 commits June 22, 2023 11:26

instance is namespace not dict

848caf4

also log time failure entry is written

b6a540b

since stop_time is not generally available

PGijsbers merged commit 93a0247 into master Jun 22, 2023

PGijsbers deleted the aws/add_start_stop_to_failures branch June 22, 2023 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add start_time and stop_time to failure log #547

Add start_time and stop_time to failure log #547

PGijsbers commented Jun 21, 2023

Add start_time and stop_time to failure log #547

Add start_time and stop_time to failure log #547

Conversation

PGijsbers commented Jun 21, 2023