Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnityTimeOutException: Couldn't start socket communication because worker number 10 is still in use. You may need to manually close a previously opened environment or use a different worker number. #3942

Closed
ShivanshuPurohit opened this issue May 9, 2020 · 7 comments
Assignees
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@ShivanshuPurohit
Copy link

When I run the following code in colab
env = UnityEnvironment(file_name='/content/Reacher_Linux_NoVis/Reacher.x86_64', worker_id=10, seed=10)
I get the aforementioned error
UnityTimeOutException: Couldn't start socket communication because worker number 10 is still in use. You may need to manually close a previously opened environment or use a different worker number.

I get the error no matter what worker_id I set it just says the passed worker_id is in use.

@ShivanshuPurohit ShivanshuPurohit added the bug Issue describes a potential bug in ml-agents. label May 9, 2020
@Locke637
Copy link

I encountered this bug too, I have to wait for about 20 seconds to run again.

@harperj harperj self-assigned this May 12, 2020
@harperj
Copy link
Contributor

harperj commented May 12, 2020

Hi @ShivanshuPurohit @Locke637 -- there is a known issue with some Linux distributions where a port is reserved for up to 60 seconds after the application is closed. We can't do much about that issue, unfortunately.

@ShivanshuPurohit are you instantiating the environment multiple times? or will an environment never actually launch? We'd appreciate any specific steps to reproduce the issue.

@ShivanshuPurohit
Copy link
Author

ShivanshuPurohit commented May 14, 2020

Sorry for the delay.

Steps to reproduce the error (I encountered this error in colab):
#get the Unity reacher environment for a "headless" version

!wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/Reacher_Linux_NoVis.zip

!unzip "content/Reacher_Linux_NoVis.zip"

!pip install unityagents

from unityagents import UnityEnvironment

env = UnityEnvironment(file_name='Reacher_Linux_NoVis/Reacher.x86_x64', worker_id=5)

This leads to the following error message:
---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/unityagents/rpc_communicator.py in initialize(self, inputs) 47 # Establish communication grpc ---> 48 self.server = grpc.server(ThreadPoolExecutor(max_workers=10)) 49 self.unity_to_external = UnityToExternalServicerImplementation()

/usr/local/lib/python3.6/dist-packages/grpc/__init__.py in server(thread_pool, handlers, interceptors, options, maximum_concurrent_rpcs, compression)

AttributeError: module 'grpc._server' has no attribute 'create_server'

During handling of the above exception, another exception occurred:

UnityTimeOutException Traceback (most recent call last)

<ipython-input-4-524433f10541> in <module>()
----> 1 env = UnityEnvironment(file_name='Reacher_Linux_NoVis/Reacher.x86_64', worker_id=5)

/usr/local/lib/python3.6/dist-packages/unityagents/environment.py in __init__(self, file_name, worker_id, base_port, curriculum, seed, docker_training, no_graphics) 62 ) 63 try: ---> 64 aca_params = self.send_academy_parameters(rl_init_parameters_in) 65 except UnityTimeOutException: 66 self._close()

/usr/local/lib/python3.6/dist-packages/unityagents/environment.py in send_academy_parameters(self, init_parameters) 503 inputs = UnityInput() 504 inputs.rl_initialization_input.CopyFrom(init_parameters) --> 505 return self.communicator.initialize(inputs).rl_initialization_output 506 507 def wrap_unity_input(self, rl_input: UnityRLInput) -> UnityOutput:

/usr/local/lib/python3.6/dist-packages/unityagents/rpc_communicator.py in initialize(self, inputs) 55 "Couldn't start socket communication because worker number {} is still in use. " 56 "You may need to manually close a previously opened environment " ---> 57 "or use a different worker number.".format(str(self.worker_id))) 58 if not self.unity_to_external.parent_conn.poll(30): 59 raise UnityTimeOutException( UnityTimeOutException: Couldn't start socket communication because worker number 5 is still in use. You may need to manually close a previously opened environment or use a different worker number.
`

@ay1011
Copy link

ay1011 commented May 16, 2020

Hi ShivanshuPurohit, I managed to run it after shutting down run time on Colab and then running just env = UnityEnvironment(file_name='Reacher_Linux_NoVis/Reacher.x86_x64')

@hvpeteet hvpeteet assigned hvpeteet and unassigned harperj Nov 4, 2021
@hvpeteet
Copy link
Contributor

hvpeteet commented Nov 8, 2021

I realize this has been stale for over a year, but as a side effect of #5553 this should not longer occur since it allows socket re-use. Once it is included in releases (release 19+) this should go away.

@hvpeteet
Copy link
Contributor

hvpeteet commented Nov 8, 2021

Closing since this should be resolved as mentioned above. Please re-open if release 19 comes out and you still have this issue.

@hvpeteet hvpeteet closed this as completed Nov 8, 2021
@github-actions
Copy link

github-actions bot commented Dec 8, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

5 participants