-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The imperative programming paradigm #7464
Comments
r = fluid.data.mnist.train()
W = fluid.Tensor()
b = fluid.Tensor()
with fluid.While(iters=100):
mb = r.next()
x = fluid.layer.data(mb.image)
l = fluid.layer.data(mb.label)
y = fluid.layer.fc(x, W, b)
cost = fluid.layer.mse(y, l)
fluid.optimize(cost)
fluid.save(W, b, "model.json")
W, b = fluid.load("model.json", "W", "b")
mb = r.next()[1:2]
x = fluid.layer.data(mb)
y = fluid.layer.fc(x, W, b)
fluid.print(y) |
We can use atexit to register a launcher. For example, we could have a launcher which saves the ProgramDesc into a file, or another launcher that calls the C++ |
Do we still need the data layer, or should we just call |
For fully parallel, we might use BTW, I suggest we use
Some basic thinking about parallel programThere is some basic thinking of imperative programming paradigm in parallel.
There are some fundamental questions. Q1: Should we use
|
We might be able to use Python multiprocess to communicate between a Python process and a Fluid process. -- from @emailweixu |
I took a glance at http://ray.readthedocs.io/en/latest/tutorial.html, for distributed training fluid API, we may have two main design:
grad_param_mapping = dict()
def def_trainer():
r = fluid.data.mnist.train()
W = fluid.Tensor()
b = fluid.Tensor()
with fluid.While(iters=100):
mb = r.next()
x = fluid.layer.data(mb.image)
l = fluid.layer.data(mb.label)
y = fluid.layer.fc(x, W, b)
cost = fluid.layer.mse(y, l)
grads, params = fluid.bp(cost)
for idx, grad in enumerate(grads):
grad_param_mapping[grad.name] = params[idx]
fluid.send(grads, overwrite=params)
def def_pserver():
with fluid.listen_and_serv() as server:
with fluid.While():
grad = server.recv_chan.get()
param = grad_param_mapping.get(grad.name)
fluid.optimize(param, grad)
server.return_chan.push(param)
@fluid.remote
class Pserver:
def __init__(self, name_map):
self.grad_param_mapping = name_map
def send(self, grad):
param = self.grad_param_mapping.get(grad.name)
fluid.optimize(param, grad)
def get(self, varname):
return self.grad_param_mapping.get(grad.name)
@fluid.remote
def trainer():
... |
A simple demo of integrating fluid and ray. import ray
import paddle.v2 as paddle
import paddle.v2.fluid as fluid
import numpy
ray.init()
NUM_WORKERS = 2
BATCH_SIZE = 128
def mnist_network():
img = fluid.layers.data(name='img', shape=[784])
hidden = fluid.layers.fc(img, size=100, act='tanh',
param_attr='fc.w',
bias_attr='fc.b')
prediction = fluid.layers.fc(hidden, size=10, act='softmax',
param_attr='sftmax.w',
bias_attr='sftmax.b')
label = fluid.layers.data(name='label',shape=[1],
dtype='int64')
loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_loss = fluid.layers.mean(x=loss)
fluid.backward.append_backward(avg_loss)
return img, label, avg_loss
@ray.remote
class Worker(object):
def __init__(self, worker_id):
self.worker_id = worker_id
self.scope = fluid.core.Scope()
self.program = fluid.Program()
self.startup = fluid.Program()
with fluid.program_guard(self.program, self.startup):
img, label, self.loss = mnist_network()
self.place = fluid.CPUPlace()
self.executor = fluid.Executor(self.place)
self.reader_creator = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=BATCH_SIZE)
self.reader = self.reader_creator()
self.feeder = fluid.DataFeeder(feed_list=[img, label], place=self.place)
def compute_gradient(self, weights):
for var_name in weights:
tensor = self.scope.var(var_name).get_tensor()
tensor.set(weights[var_name][0], self.place)
tensor.set_lod(weights[var_name][1])
try:
data = next(self.reader)
except:
self.reader = self.reader_creator()
data = next(self.reader)
outs = self.executor.run(self.program,
feed=self.feeder.feed(data),
scope=self.scope,
fetch_list=[ var_name + "@GRAD" for var_name in weights] + [self.loss])
if self.worker_id == 0:
print outs[-1]
return outs[:-1]
@ray.remote
class PServer(object):
def __init__(self, learning_rate):
self.scope = fluid.core.Scope()
self.learning_rate = learning_rate
self.program = fluid.Program()
self.startup = fluid.Program()
with fluid.program_guard(self.program, self.startup):
mnist_network()
self.place = fluid.CPUPlace()
self.executor = fluid.Executor(self.place)
self.executor.run(self.startup, scope=self.scope)
self.optimize_program = fluid.Program()
def apply_gradients(self, *gradients):
# TODO(qijun) an optimization program is needed
mean_gradients = numpy.mean(gradients, axis=0)
weights = self.get_weight()
for idx, name in enumerate(weights):
w = weights[name][0]
w -= self.learning_rate * mean_gradients[idx]
self.scope.find_var(name).get_tensor().set(w, self.place)
def get_weight(self):
weights = dict()
for p in self.program.global_block().iter_parameters():
lod_tensor = self.scope.find_var(p.name).get_tensor()
weights[p.name] = (numpy.copy(numpy.array(lod_tensor)), lod_tensor.lod())
return weights
if __name__ == '__main__':
ps = PServer.remote(1e-3 * NUM_WORKERS)
weights = ps.get_weight.remote()
works = [Worker.remote(i) for i in range(NUM_WORKERS)]
while True:
gradients = [work.compute_gradient.remote(weights) for work in works]
ps.apply_gradients.remote(*gradients)
weights = ps.get_weight.remote() |
@QiJune in the above code, And, yes, using |
No.
This example contains N workers and one pserver. In the main function, these lines will wait all workers complete. |
Sorry, my fault. |
This is to say that all the workers use their own |
@chengduoZH This is just a simple demo following sync_parameter_server in ray. |
@typhoonzero Has refine the code and call |
@QiJune Cool 👍! |
Just list some facts about
Goods about
We may not use |
I quickly went over Ray. It seems an implementation of the Actor model in Python? In my understanding, Ray is heavily integrated with Python. However, Fluid is trying to get out of Python. Are they having conflicting ideas? |
From @typhoonzero
def def_pserver():
with fluid.listen_and_serv() as server:
with fluid.While():
grad = server.recv_chan.get()
param = grad_param_mapping.get(grad.name)
fluid.optimize(param, grad)
server.return_chan.push(param) For my personal opinion:
So, maybe the program would look like: def def_pserver():
chan = fluid.Channel(type=sendrecv.message)
with fluid.Go(chan):
with fluild.listen_and_serv() as server:
msg = server.get_msg()
chan.push(msg)
with fluid.While():
grad = chan.get()
param = grad_param_mapping.get(grad.name)
fluid.optimize(param, grad)
return_chan.push(param) |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
The text was updated successfully, but these errors were encountered: