Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten timeout duration for environment close #3679

Merged
merged 6 commits into from
Mar 24, 2020

Conversation

harperj
Copy link
Contributor

@harperj harperj commented Mar 24, 2020

Proposed change(s)

The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment. This led to long waits for the error
response when communication version wasn't matching.

This change updates the timeout duration so that we wait a fixed
timeout of 5 seconds before force-killing the environment worker.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change updates the timeout duration so that we wait a fixed
timeout of 5 seconds before force-killing the environment worker.
@harperj harperj requested a review from vincentpierre March 24, 2020 19:07
@chriselion
Copy link
Contributor

This looks good as it is, but just wanted to give one other possible approach:

  • make _close() take a timeout parameter. This can default to None and use self.timeout_wait if the actual value is None
  • pass 0 timeout in the cases we want to close immediately (or make _close_now() which calls _close() with 0 timeout). I think this should happen everywhere except for close() (so mismatched API, connection timeout, and invalid launch_string all close immediately)
  • Keep close() using self.timeout_wait

@harperj harperj force-pushed the develop-failure-returncodes branch from 8b37e48 to acc08e3 Compare March 24, 2020 22:13
@harperj
Copy link
Contributor Author

harperj commented Mar 24, 2020

I like that solution @chriselion. I've made the change.

Jonathan Harper added 2 commits March 24, 2020 16:00
@harperj harperj merged commit 312a439 into master Mar 24, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-failure-returncodes branch March 24, 2020 23:37
vincentpierre pushed a commit that referenced this pull request Mar 26, 2020
The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change forces a timeout duration of 0 when handling errors.
vincentpierre pushed a commit that referenced this pull request Mar 26, 2020
The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change forces a timeout duration of 0 when handling errors.
@vincentpierre vincentpierre mentioned this pull request Mar 26, 2020
10 tasks
vincentpierre added a commit that referenced this pull request Mar 30, 2020
* [bug-fix] Increase height of wall in CrawlerStatic (#3650)

* [bug-fix] Improve performance for PPO with continuous actions (#3662)

* Corrected a typo in a name of a function (#3670)

OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document

* Add Academy.AutomaticSteppingEnabled to migration (#3666)

* Fix editor port in Dockerfile (#3674)

* Hotfix memory leak on Python (#3664)

* Hotfix memory leak on Python

* Fixing

* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done

* [bug-fix] Make Python able to deal with 0-step episodes (#3671)

* adding some comments

Co-authored-by: Ervin T <ervin@unity3d.com>

* Remove vis_encode_type from list of required (#3677)

* Update changelog (#3678)

* Shorten timeout duration for environment close (#3679)

The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change forces a timeout duration of 0 when handling errors.

* Bumping the versions

* handle multiple dones in a single step (#3700)

* handle multiple dones in a single step

* [tests] Make end-to-end tests more stable (#3697)

* [bug-fix] Fix entropy computation for GaussianDistribution (#3684)

* Fix how we set logging levels (#3703)

* cleanup logging

* comments and cleanup

* pylint, gym

* [skip-ci] Update changelog for logging fix. (#3707)

* [skip ci] Update README

* [skip ci] Fixed a typo

Co-authored-by: Ervin T <ervin@unity3d.com>
Co-authored-by: Adam Streck <adam.streck@gmail.com>
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
chriselion pushed a commit that referenced this pull request Apr 8, 2020
* Bumping version on the release (#3615)

* Update examples project to 2018.4.18f1 (#3618)

From 2018.4.14f1.  An internal package dependency was updated as
a side effect.

* Remove dead components from the examples scenes (#3619) (#3624)

* Improve warnings and exception if using unsupported combo

* add meta file

* fix unit test

* enforce onnx conversion (expect tf2 CI to fail) (#3600)

* Update error message

* Updated the release branch docs (#3621)

* Updated the release branch docs

* Edited the README

* make sure top-level timer is closed before writing

* Remove space from Product Name for examples

In #2588 it was suggested that the space in the Product Name for
our example environments causes confusion when using a default build
because of the need to escape the space in the build filename.

This change removes the space from the Product Name in the project's
player settings.

* [bug-fix] Increase 3dballhard and GAIL default steps (#3636)

* Updating the NN models (#3632)

* Updating the NN models

* Update gridworld

* [skip ci] Update BallHard

* Update hallway

* Hotfixes for Release 0.15.1  (#3698)

* [bug-fix] Increase height of wall in CrawlerStatic (#3650)

* [bug-fix] Improve performance for PPO with continuous actions (#3662)

* Corrected a typo in a name of a function (#3670)

OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document

* Add Academy.AutomaticSteppingEnabled to migration (#3666)

* Fix editor port in Dockerfile (#3674)

* Hotfix memory leak on Python (#3664)

* Hotfix memory leak on Python

* Fixing

* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done

* [bug-fix] Make Python able to deal with 0-step episodes (#3671)

* adding some comments

Co-authored-by: Ervin T <ervin@unity3d.com>

* Remove vis_encode_type from list of required (#3677)

* Update changelog (#3678)

* Shorten timeout duration for environment close (#3679)

The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change forces a timeout duration of 0 when handling errors.

* Bumping the versions

* handle multiple dones in a single step (#3700)

* handle multiple dones in a single step

* [tests] Make end-to-end tests more stable (#3697)

* [bug-fix] Fix entropy computation for GaussianDistribution (#3684)

* Fix how we set logging levels (#3703)

* cleanup logging

* comments and cleanup

* pylint, gym

* [skip-ci] Update changelog for logging fix. (#3707)

* [skip ci] Update README

* [skip ci] Fixed a typo

Co-authored-by: Ervin T <ervin@unity3d.com>
Co-authored-by: Adam Streck <adam.streck@gmail.com>
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>

* fix changelog

* keep master gridworld

Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
Co-authored-by: Ervin T <ervin@unity3d.com>
Co-authored-by: Adam Streck <adam.streck@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants