Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BSP model in ps-lite] Discussion about BSP model implementation using PS-lite #143

Open
authwork opened this issue Dec 20, 2018 · 2 comments

Comments

@authwork
Copy link

authwork commented Dec 20, 2018

I have surveyed lots of projects using ps-lite to implement BSP model.
Most of them simply behave like:

kv.wait(kv.push)
kv.wait(kv.pull)

I do not think they are real BSP model because each worker only wait for the accomplishment of its own push (not other workers)

Based on the test_simple_app and docs/overview.md, the BSP way should be:

Scheduler

/* The code also shows why the scheduler cannot easily implement SSP or some other complicated models because it uses wait to know the progress of each worker.
In fact, you can using a big table to store all timestamp~(s*N), and when entering the (s+1)-th iteration, you need to wait for timestamps of all workers at the 1-st itertaion. This is similiar to SSP model, but is not efficiect,
*/
 if (IsScheduler()) {
    std::vector<int> ts;
    for (int i = 0; i < n; ++i) {
        ts.clear()
        for(worker in workergroup): 
             ts.push_back(app.Request(head, "body", receive_id)) // worker_id=i*2+9, see WorkerRankToID, this step needs to be confirmed.
        for(int t : ts) 
             app.Wait(t);
        //If this can broadcast the request to all workers, these two step may be simply rewrite as :
        //app.Wait(app.Request(head, "body", kWorkerGroup)) 
    }
 }

Server

   server->set_request_handle(KVServerDefaultHandle<float>()); //using the default

Worker

   worker->set_request_handle(request_handle)
   request_handle(){
        // we can check the head and body sent from scheduler
        Read(&X, &Y);  // read minibatch with b / num_workers examples
        kv.wait(kv.Pull(&w));      // pull the recent weight from the servers
        ComputeGrad(X, Y, w, &grad);  // compute the gradient
        kv.wait(kv.Push(grad)); // push my update to server
        worker->Response(req); //response to scheduler.
   }

I think the overall logic is similar to the BSP SGD described in the docs/overview

@authwork authwork changed the title [BSP model in ps-lite] Disscusion about BSP model implementation using PS-lite [BSP model in ps-lite] Discussion about BSP model implementation using PS-lite Dec 20, 2018
@mli
Copy link
Member

mli commented Dec 20, 2018

on the server side, it will wait all workers' data to merge them before sending back ACK for workers' requests

@authwork
Copy link
Author

authwork commented Dec 22, 2018

on the server side, it will wait all workers' data to merge them before sending back ACK for workers' requests

Exactly, but this is the description of BSP model. (It means the server needs to wait all workers' data)

In the real implementation, we need to use the scheduler to manage the data synchronization (see here) without changing the KVServerDefaultHandle.

// WaitAllFinished(); 
for(int t : ts) 
      app.Wait(t); //wait all workers finish push

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants