Flexps Programming Guide

This page is to teach you how to write a machine learning program in Flexps. Here we use the case of Logistic Regression as an example.

1. Load Data

One can load data easily from HDFS by using HDFSManager:

HDFSManager hdfs_manager(my_node, nodes, config, zmq_context);
hdfs_manager.Start();

  std::vector<DataObj> data;
  std::mutex mylock;
  hdfs_manager.Run([my_node, &data, &mylock](HDFSManager::InputFormat* input_format, int local_tid) {
    
    DataObj this_obj;
    while (input_format->HasRecord()) {
      auto record = input_format->GetNextRecord();
      if (record.empty()) return;
      this_obj = libsvm_parser(record);

      mylock.lock();    
      data.push_back(std::move(this_obj));
      mylock.unlock();
    }
  });
  hdfs_manager.Stop();

Where DataObj is a self-defined data object type and libsvm_parser is a parser to parse the libsvm datatype.

2. Start and set up engine

Engine is a driver program that manages all the background programs of the system.

Engine engine(my_node, nodes);
engine.StartEverything();

set server range

Set up the range for each server you used.

std::vector<third_party::Range> range;
int num_total_servers = nodes.size() * FLAGS_num_servers_per_node;
for (int i = 0; i < num_total_servers - 1; ++ i) {
    range.push_back({FLAGS_num_dims / num_total_servers * i, FLAGS_num_dims / num_total_servers * (i + 1)});
}
range.push_back({FLAGS_num_dims / num_total_servers * (num_total_servers - 1), (uint64_t)FLAGS_num_dims});

set model type and storage type

ModelType: ASP, BSP, SSP, (SparseSSP)
StorageType: Map, Vector

ModelType model_type = ModelType::SSP;
StorageType storage_type = StorageType::Vector;

create table

engine.CreateTable<float>(kTableId, range, model_type, storage_type, FLAGS_kStaleness, FLAGS_kSpeculation, sparse_ssp_recorder_type);
engine.Barrier();

kTableId: Id of the table
FLAGS_kStaleness: Staleness of SSP model
FLAGS_kSpeculation: for SparseSSP model (not used here)
sparse_ssp_recorder_type: for SparseSSP model (not used here)

3. Construct task

Construct your own machine learning task.

Allocate workers

Allocate workers for each node.

MLTask task;
std::vector<WorkerAlloc> worker_alloc;
for (auto& node : nodes) { // each node has num_workers_per_node workers
    worker_alloc.push_back({node.id,FLAGS_num_workers_per_node});
} 
task.SetWorkerAlloc(worker_alloc);
task.SetTables({kTableId});  // Use table 0

Set Lambda

put your task's logic inside the Lambda, and pass needed variables into it.

task.SetLambda([kTableId, &data](const Info& info) {
    // put your machine learning task here
}

create KV Table (Inside Lambda)

We offer five kinds of kv-table, KVTable, KVClientTable, SparseKVClienTable, KVChunkClientTable, SimpleKVChunkTable, each of them served as the interface for fetching or processing parameters on the server side. Here we use the simplest one (KVTable):

auto table = info.CreateKVTable<float>(kTableId);

Note that the returned table is an std::unique_ptr pointing to the client table.

Logistic Regression computational logic (Inside Lambda)

third_party::SArray<float> params;
third_party::SArray<float> deltas;
BatchDataSampler<DataObj> batch_data_sampler(data, FLAGS_batch_size);
  
for (int i = 0; i < FLAGS_num_iters; ++ i) {
batch_data_sampler.random_start_point();
auto& keys = batch_data_sampler.prepare_next_batch();
auto& data_ptrs = batch_data_sampler.get_data_ptrs();
table->Get(keys, &params);
deltas.resize(keys.size(), 0.0);

for (auto data : data_ptrs) {  // iterate over the data in the batch
    auto& x = data->first;
    float y = data->second;
    if (y < 0)
        y = 0;
    float pred_y = 0.0;
    int j = 0;
    for (auto field : x) {
        while (keys[j] < field.first)
            j += 1;
        pred_y += params[j] * field.second;
    }
    pred_y = 1. / (1. + exp(-1 * pred_y));
    j = 0;
    for (auto field : x) {
        while (keys[j] < field.first) {
            j += 1;
        }
        deltas[j] += FLAGS_alpha * field.second * (y - pred_y);
    }
}
table->Add(keys, deltas);  // issue Push
table->Clock();

batch_data_sampler: A sampler that helps sample the data in batch, can return samples' keys and pointers to data.

You can now write your own machine learning application in Flexps :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexps Programming Guide

Flexps Programming Guide

1. Load Data

2. Start and set up engine

set server range

set model type and storage type

create table

3. Construct task

Allocate workers

Set Lambda

create KV Table (Inside Lambda)

Logistic Regression computational logic (Inside Lambda)

Clone this wiki locally