Skip to content
Vincent_Ng edited this page Sep 22, 2017 · 6 revisions

Run

Configuration file

A configuration file should be provided to the FlexPS program. The file consists of multiple lines, where each line is in the format of node_id:hostname:port.

For example, to run with one process, The file can be:

0:localhost:32321

To run multiple processes distributedly, The file can be:

0:worker1:37542
1:worker2:37542
2:worker3:37542
3:worker4:37542
4:worker5:37542

Note that you may run multiple processes in the same machine but remember to use different ports.

TODO: The node ids are not required to be consecutive and starting from 0. We may further test it. We also need to make sure the program will exit gracefully and provide enough information to developer when the host is not reachable and the port is not available.

Running as a single process

After you have compiled your application code, you can directly execute the binary with the required command line arguments. We use the gflags library so you can simply pass the command line arguments by --arg=val. You may check this link for the usage of gflags.

For most of the FlexPS application, we require user to provide two arguments. The first one is my_id, representing this process's id (node_id). The second one is config_file where you need to give the path to the configuration file.

/path/to/program --my_id=0 --config_file=/path/to/config_file

Here is an example:

GLOG_logtostderr=1 ./debug/BasicExample --my_id=0 --config_file=machinefiles/local

The GLOG_logtostderr=1 is used to let glog print the log to your screen instead of a file in /tmp.

Running multiple processes

A python script is provided to launch your binary in parallel. This is the recommended way even if you are running a single process.

A template of the launch script can be found in scripts/launch.py.

In the template, you need to set the hostfile and progfile correctly which are relative paths from your FLexPS project home path. For example, you may set them as below to run the BasicExample:

hostfile = "machinefiles/local"
progfile = "debug/BasicExample"

Then, you just need to need to run the python script and your binary will run distributedly according to the hostfile.

python scripts/launch.py

Killing the programs

If your program cannot exit normally, you need to run the script kill.py in the scripts folder. You need to provide the hostfile and your program name. Here is an example to kill the BasicExample run according to the hostfile:

python scripts/kill.py machinefiles/local BasicExample

Note that the "Address already be used" can normally be solved by killing the program that occupies the port or using another port.

More details about the scripts

  • To use the python script to launch your program, make sure that you have configured passswordless ssh access to the hosts that your program runs on.
  • User customized command line flags are passed in the params variable in the launch script. "arg":"val" is equivalent to the command line version --arg=val
  • The env_params variable set the glog related command line variable. You may change GLOG_v=-1 to GLOG_v=1 to log more details. Follow this link to see how to use glog.
  • To dump the core, you may add ulimit -c unlimited to the launch script. (TODO)

TODO: Now we use the terms hostfile, config_file, machinefile interchangeably and may be confusing. Actually they are the same thing, we should unify them. TODO: Mark down the problems you have encountered!

Clone this wiki locally