You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi.
Sorry if I missed it in the docs or Readme, but I can't seem to find details about running ATM on a cluster (local). Do I have to implement this myself using something like Apache Spark?
Thanks
The text was updated successfully, but these errors were encountered:
The current ATM version is already prepared to run as a cluster, but setting it up is currently a responsibility of the user.
All you have to do to have a cluster running is starting multiple worker instances.
These worker instances can either be all on the same machine or on different machines, and the only requirements are:
All the machines need to have access to the database being used.
All the machines need to have access to the data in the same way by either having a shared filesystem which is mounted in the same path for all the machines or using an S3 bucket as the dataset source.
For example, if you just wanted to start a cluster with 4 workers on your local machine, all you need to do is running the following two commands:
atm enter_data ...your enter_data options here..
for i in {1..4}; do atm worker ..your worker options here.. > /dev/null & done
The first command will enter your data as usual, and the second one will start 4 workers as background processes, redirecting their outputs to /dev/null to avoid cluttering your console, as you will be able to find their logs in the logs/{your hostname}.txt file anyway.
Hi.
Sorry if I missed it in the docs or Readme, but I can't seem to find details about running ATM on a cluster (local). Do I have to implement this myself using something like Apache Spark?
Thanks
The text was updated successfully, but these errors were encountered: