Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start_time and stop_time to failure log #547

Merged
merged 3 commits into from
Jun 22, 2023

Conversation

PGijsbers
Copy link
Collaborator

This can be useful to see how long instances were available before errors occurred (identify e.g., start up failures).

@PGijsbers PGijsbers added the aws AWS support label Jun 21, 2023
@PGijsbers PGijsbers merged commit 93a0247 into master Jun 22, 2023
@PGijsbers PGijsbers deleted the aws/add_start_stop_to_failures branch June 22, 2023 09:58
PGijsbers added a commit that referenced this pull request Jun 22, 2023
This helps more quickly identify at what stage the failure took place. E.g., if it's just a few minutes in, it is probably setup failure (such as connectivity issues).
PGijsbers added a commit that referenced this pull request Jun 22, 2023
* Update AutoGluon `max_memory` from 0.1 to 0.4 in persist_models (#543)

* Add `optimize_for_deployment` for AutoGluon_hq (#544)

* Reduce training time by 10% if a high_quality preset is used (#546)

* Reduce training time by 10% if a high_quality preset is used

High quality presets perform a post-fit step which takes 10~15%
of total time (by Nick's estimate). To ensure comparisons stay
reasonably fair we pre-emptively tell AutoGluon to use less time,
so that all frameworks' models are based on "max_total_time"
amount of effort.

* Allow preset to be str or list and still reduce if hq or gq

* Add identical markers to identify fit/inferencetime/predict stages (#548)

* Add start_time, stop_time and log_time to failure.csv (#547)

This helps more quickly identify at what stage the failure took place. E.g., if it's just a few minutes in, it is probably setup failure (such as connectivity issues).

* Docker/permissions (#550)

* Remove ownership changing and starting as user for docker images

Since the USER is overwritten by `-u` for non-Windows platforms,
which creates issues when the account running the docker image
is not the same as the one that created it.

* Dont run docker as root since images no longer have associated user

* Ignore some additional files not needed to run the benchmark

* Create root dir if it does not exist

This is required, because otherwise in docker mode a non-existent
directory is mounted, which is by default locked to `root`
permissions. This in turn makes the benchmark app unable to create
the subdirectories when the image is run as user.

* Further remove user info from docker build and add run_as option

The run_as option is then configurable so that it can be enabled
for people who run into issues. Unfortunately, I observed
different behavior from two systems with the same OS and docker
versions installed. So for now I give up on one unified solution.

* Update GAMA for v23.0.0 (#551)

---------

Co-authored-by: Nick Erickson <innixma@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws AWS support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant