-
Notifications
You must be signed in to change notification settings - Fork 11
Run
A configuration file should be provided to the FlexPS program. The file consists of multiple lines, where each line is in the format of node_id:hostname:port
. This file is called hostfile
which will be mentioned later.
For example, to run with one process, The file can be:
0:localhost:32321
To run multiple processes distributedly, The file can be:
0:worker1:37542
1:worker2:37542
2:worker3:37542
3:worker4:37542
4:worker5:37542
Note that you may run multiple processes in the same machine but remember to use different ports.
TODO: The node ids are not required to be consecutive and starting from 0. We may further test it. We also need to make sure the program will exit gracefully and provide enough information to developer when the host is not reachable and the port is not available.
A python script is provided to launch your binary in parallel. This is the recommended way even if you are running a single process.
A template of the launch script can be found in scripts/launch.py
.
In the template, you need to set the hostfile
and progfile
correctly which are relative paths from your FLexPS project home path. Specifically, hostfile
is the relative path from FlexPS_HOME
to your configuration file, and progfile
is the relative path from FlexPS_HOME
to the application you want to run. For example, you may set them as below to run the BasicExample
:
hostfile = "machinefiles/local"
progfile = "debug/BasicExample"
Optionally, you may supply your own gflags parameters to the program (We use the gflags
library so you can simply pass the command line arguments by --arg=val
. You may check this link for the usage of gflags.).
params = {
"abc": "hi",
...
}
And also the env_params:
env_params = (
"GLOG_logtostderr=true "
"GLOG_v=-1 "
"GLOG_minloglevel=0 "
)
Then, you just need to need to run the python script and your binary will run distributedly according to the hostfile.
python scripts/launch.py
You can kill the programs by
python scripts/launch.py kill
If your program cannot exit normally, you need to run the script kill.py
in the scripts folder. You need to provide the hostfile and your program name. Here is an example to kill the BasicExample
run according to the hostfile:
python scripts/kill.py machinefiles/local BasicExample
Note that the "Address already be used" can normally be solved by killing the program that occupies the port or using another port.
- To use the python script to launch your program, make sure that you have configured passswordless ssh access to the hosts(including your local host) that your program runs on.
- User customized command line flags are passed in the
params
variable in the launch script."arg":"val"
is equivalent to the command line version--arg=val
- The
env_params
variable set the glog related command line variable. You may changeGLOG_v=-1
toGLOG_v=1
to log more details. Follow this link to see how to use glog. - To dump the core, you may add
ulimit -c unlimited
to the launch script. (TODO)
TODO: Now we use the terms hostfile, config_file, machinefile interchangeably and may be confusing. Actually they are the same thing, we should unify them. TODO: Mark down the problems you have encountered!
You can directly execute the binary with the required command line arguments.
For most of the FlexPS application, we require user to provide two arguments. The first one is my_id
, representing this process's id (node_id). The second one is config_file
where you need to give the path to the configuration file.
/path/to/program --my_id=0 --config_file=/path/to/config_file
Here is an example:
GLOG_logtostderr=1 ./debug/BasicExample --my_id=0 --config_file=machinefiles/local
The GLOG_logtostderr=1 is used to let glog print the log to your screen instead of a file in /tmp
.