Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parallelization for Anylogic sample #49

Merged
merged 4 commits into from
Aug 14, 2023
Merged

Conversation

mzat-msft
Copy link
Collaborator

Description

This commits introduces proper handling of parallelization for Baobab simulations.
The earlier implementation did not allow running multiple simulation environments because there was only one instance of Baobab running, and that instance was not segregating information related
to the Anylogic sim -- Rllib sim environment.
This PR introduces two main things:

  • It changes the way Baobab and Anylogic sim are instantiated. Before start.sh only instantiated 1 copy of each. Now, the sim.py connector launches them. This allows to launch as many Baobab and Anylogic sim as we have RLlib environments.
  • Only one memcached instance remains, and each Baobab instance write in their own namespace for data segregation.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Run the sample to see that everything works. Try to play with the number of rollout workers to see that effectively training completes faster.

Checklist:

  • I have squashed my previous commits into one commit and added a meaningful commit message.
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@jillianmclements
Copy link
Collaborator

It might be helpful to link Baobab docs or give a brief description for users unfamiliar with it. I can't get this to run on an M1 (AML works fine), and I'm unsure how to debug properly given my limited knowledge.

@mzat-msft
Copy link
Collaborator Author

It might be helpful to link Baobab docs or give a brief description for users unfamiliar with it. I can't get this to run on an M1 (AML works fine), and I'm unsure how to debug properly given my limited knowledge.

Hey Jill, sorry to hear that. ATM there's no docs on Baobab, would you like to work together on finding a place where to document it? Since it's living in the main plato package I don't think this sample is a good place.

Unfortunately the log you posted in Teams does not show any error, so also that makes it difficult to debug. What I would try to do is running the Anylogic sim java exec locally and see what happens (maybe pointing that to Bonsai), and trying to run the gunicorn command for launching plato to see what happens.

@jillianmclements
Copy link
Collaborator

It might be helpful to link Baobab docs or give a brief description for users unfamiliar with it. I can't get this to run on an M1 (AML works fine), and I'm unsure how to debug properly given my limited knowledge.

Hey Jill, sorry to hear that. ATM there's no docs on Baobab, would you like to work together on finding a place where to document it? Since it's living in the main plato package I don't think this sample is a good place.

Unfortunately the log you posted in Teams does not show any error, so also that makes it difficult to debug. What I would try to do is running the Anylogic sim java exec locally and see what happens (maybe pointing that to Bonsai), and trying to run the gunicorn command for launching plato to see what happens.

Thanks, Marco! I'll approve this since it does work for me on AML. It appears the Baobab API is not responding on my local machine, it just repeats the following the in logs:
[2023-08-11 19:07:27][baobab][DEBUG] Reading 'method' from cache.
[2023-08-11 19:07:27][baobab][DEBUG] Reading for method timed out.
[2023-08-11 19:07:28][baobab][DEBUG] Reading 'method' from cache.
[2023-08-11 19:07:28][baobab][DEBUG] Reading for method timed out.
[2023-08-11 19:07:28][baobab][DEBUG] Reading 'method' from cache.
[2023-08-11 19:07:28][baobab][DEBUG] Reading for method timed out.
...

I tried changing the sim's linux shell script to the mac script provided in exported, but I'm getting a different error. I'll try to dig into more in a couple of weeks if I have time.

Copy link
Collaborator

@jillianmclements jillianmclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works on AML, but I can't get it to run on an M1 machine.

Instead of launching the sim from a start script, this allows to scale the
number of Anylogic sims with the amount of ray workers.
This is introduced to be able to have more Baobab instances communicate with
the same memcached backend.
When initiating a new sim environment, we create also a baobab instance with
unique namespace for environment.
This allows to properly run multiple sim in parallel.
@mzat-msft mzat-msft merged commit 9e32391 into main Aug 14, 2023
14 checks passed
@mzat-msft mzat-msft deleted the mzat/anylogic-parallel branch August 14, 2023 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants