-
Notifications
You must be signed in to change notification settings - Fork 11
Run
A configuration file should be provided to the FlexPS program. The file consists of multiple lines, where each line is in the format of node_id:hostname:port
.
For example, to run with one process, The file can be:
0:localhost:32321
To run multiple processes distributedly, The file can be:
0:worker1:37542
1:worker2:37542
2:worker3:37542
3:worker4:37542
4:worker5:37542
Note that you may run multiple processes in the same machine but remember to use different ports.
TODO: The node ids are not required to be consecutive and starting from 0. We may further test it. We also need to make sure the program will exit gracefully and provide enough information to developer when the host is not reachable and the port is not available.
After you have compiled your application code, you can directly execute the binary with the required command line arguments. We use the gflags
library so you can simply pass the command line arguments by --arg=val
. You may check this link for the usage of gflags.
For most of the FlexPS application, we require user to provide two arguments. The first one is my_id
, representing this process's id (node_id). The second one is config_file
where you need to give the path to the configuration file.
/path/to/program --my_id=0 --config_file=/path/to/config_file
Here is an example:
GLOG_logtostderr=1 ./debug/BasicExample --my_id=0 --config_file=machinefiles/local
The GLOG_logtostderr=1 is used to let glog print the log to your screen instead of a file in /tmp
.
A python script is provided to launch your binary in parallel. This is the recommended way even if you are running a single process.
A template of the launch script can be found in scripts/launch.py
.
In the template, you need to set the hostfile
and progfile
correctly which are relative paths from your FLexPS project home path. For example, you may set them as below to run the BasicExample
:
hostfile = "machinefiles/local"
progfile = "debug/BasicExample"
Then, you just need to need to run the python script and your binary will run distributedly according to the hostfile.
python scripts/launch.py
If your program cannot exit normally, you need to run the script kill.py
in the scripts folder. You need to provide the hostfile and your program name. Here is an example to kill the BasicExample
run according to the hostfile:
python scripts/kill.py machinefiles/local BasicExample
Note that the "Address already be used" can normally be solved by killing the program that occupies the port or using another port.
- To use the python script to launch your program, make sure that you have configured passswordless ssh access to the hosts that your program runs on.
- User customized command line flags are passed in the
params
variable in the launch script."arg":"val"
is equivalent to the command line version--arg=val
- The
env_params
variable set the glog related command line variable. You may changeGLOG_v=-1
toGLOG_v=1
to log more details. Follow this link to see how to use glog. - To dump the core, you may add
ulimit -c unlimited
to the launch script. (TODO)
TODO: Now we use the terms hostfile, config_file, machinefile interchangeably and may be confusing. Actually they are the same thing, we should unify them. TODO: Mark down the problems you have encountered!