Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running ATM on a cluster #128

Closed
beevabeeva opened this issue Apr 18, 2019 · 3 comments
Closed

Running ATM on a cluster #128

beevabeeva opened this issue Apr 18, 2019 · 3 comments
Assignees
Milestone

Comments

@beevabeeva
Copy link

Hi.
Sorry if I missed it in the docs or Readme, but I can't seem to find details about running ATM on a cluster (local). Do I have to implement this myself using something like Apache Spark?

Thanks

@csala
Copy link
Contributor

csala commented Apr 25, 2019

Hi @beevabeeva

The current ATM version is already prepared to run as a cluster, but setting it up is currently a responsibility of the user.

All you have to do to have a cluster running is starting multiple worker instances.
These worker instances can either be all on the same machine or on different machines, and the only requirements are:

  • All the machines need to have access to the database being used.
  • All the machines need to have access to the data in the same way by either having a shared filesystem which is mounted in the same path for all the machines or using an S3 bucket as the dataset source.

For example, if you just wanted to start a cluster with 4 workers on your local machine, all you need to do is running the following two commands:

atm enter_data ...your enter_data options here..
for i in {1..4}; do atm worker ..your worker options here.. > /dev/null & done

The first command will enter your data as usual, and the second one will start 4 workers as background processes, redirecting their outputs to /dev/null to avoid cluttering your console, as you will be able to find their logs in the logs/{your hostname}.txt file anyway.

I hope this helps!

@csala
Copy link
Contributor

csala commented Apr 25, 2019

Also see #130, which will make cluster management much easier once done.

@csala
Copy link
Contributor

csala commented May 7, 2019

Closed via #133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants