-
Notifications
You must be signed in to change notification settings - Fork 3
ali01/qjam
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
qjam is a framework for distributed computing. It leverages ssh public key setups to automatically bootstrap and start worker nodes without manual intervention. DEPENDENCIES -------------------------------- The following packages are required to use qjam: * Python Nose (sudo aptitude install python-nose) * Python numpy (sudo aptitude install python-numpy) EXAMPLES -------------------------------- You need to have passwordless public-key ssh access to localhost as your current username. If `ssh localhost' at a terminal gets you to a new prompt, you're golden. If not, set up ssh keys and/or an ssh agent as appropriate. The username used to access remote machines via ssh is the username used on the master machine. You can specify an alternate username to be used on the remote machines in two ways: * Set the QJAM_USER environment variable. * Use a 'User' directive in your ~/.ssh/config. With the proper ssh setup, these example programs can be run directly from the commandline: bin/sum-matrix-example.py This is a simple example that explains how to use the qjam API to execute code across many machines. Usage: ./sum-matrix-example.py localhost machine2 machine3 examples/newton_sqrt.py Approximates the square root of a number using Newton's method. Usage: python examples/newton_sqrt.py 123456789 ARCHITECTURE -------------------------------- There are three main pieces to the qjam framework: the Master, the RemoteWorker, and the Worker. The Worker is a program that is copied to all of the remote machines during the bootstrapping process. It is responsible for waiting for instructions from the Master, and upon receiving work, processing that work and returning the result. The RemoteWorker is a special Python class that communicates with the remote machines. One RemoteWorker has a single target machine that can be reached via ssh. There can be many RemoteWorkers with the same target (say, in the case where there are many cores on a machine), but only one target per RemoteWorker. At creation, the RemoteWorker bootstraps the remote machine by copying the requisite files to run the Worker program, via ssh. After the bootstrapping process completes, the RemoteWorker starts a Worker process on the remote machine and attaches to it. The RemoteWorker is the proxy between the Master and the Worker. The Master is a Python class that divides up work and assigns the work units among its pool of RemoteWorker instances. These RemoteWorker instances relay the work to the Worker programs running on the remote machines and wait for the results. Usage of the qjam framework is simple. There is one primary point of entry on a Master instance: master.run(module, params, dataset). The 'module' argument is a Python module object that contains a function 'mapfunc(params, dataset)' that will be called by the worker on the params and slices of the whole dataset. The 'params' argument specifies an arbitrary Python object that is passed directly to all workers. The 'dataset' argument is an instance of a DataSet type (defined in qjam.dataset) that helps qjam determine how to slice the input data. There is a difference between 'params' and 'dataset': multiple calls to master.run() can use the same 'params' and 'dataset', but the pieces of 'dataset' will not be retransferred to all of the remote machines on the second call to master.run(); the slices of data are cached locally at the nodes. However, 'params' will be transferred in full on every call to master.run(). Therefore, 'params' is best used for data that changes between calls to master.run(), and 'dataset' should be used for data that does not change. TESTS -------------------------------- The tests for qjam are located in the 'tests' directory. To run the test suite, type: nosetests in the root directory. The tests are a good way to learn more about the architecture of qjam. RUNNING ON STANFORD AI LAB MACHINES ---------------------------------------- The yggdrasil machines are missing Python 2.6 and Numpy. So, you must first install Python 2.6 and Numpy locally. As of 2010/11/29, yggdrasil[1-4] all have these installed at the prefix /tmp/py26. Adapt the /tmp/update_yggdrasil_py26.sh script to install these on other yggdrasils. Set the env var QJAM_REMOTE_PYTHON=/tmp/py26/bin/python in your calls to qjam to use this Python. Make sure /tmp/py26/bin/python is in your $PATH, and set $PYTHONPATH to the top-level 'qjam' directory. Examples: yggdrasil1% QJAM_REMOTE_PYTHON=/tmp/py26/bin/python2.6 \ bin/sum-matrix-example.py \ yggdrasil1 yggdrasil2 yggdrasil3 yggdrasil4 yggdrasil1% QJAM_REMOTE_PYTHON=/tmp/py26/bin/python2.6 \ python examples/newton_sqrt.py 123456789 yggdrasil1 yggdrasil1% QJAM_REMOTE_PYTHON=/tmp/py26/bin/python2.6 \ nosetests tests/
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published