-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynet-92]. Multi-device support #704
Conversation
… cpu impl temporarily.
… format(--dynet-devices CPU,GPU:0,GPU:2)
In general, this is great: I think multi-device support will be a great feature for DyNet to have. First, I have a high level comment. In my mind, there are two design decisions here: How do we specify the "default" device of a graph node when it is not specified explicitly
The first has the advantage of perhaps being easier to understand, but may result in hidden memory moves where people aren't expecting them. It also adds some code complexity. When some of the inputs are not on the same device what do you do?
My opinion: I tend to prefer 2./2. respectively, but could be convinced otherwise. |
This is really great! I like options 2 and 2 also, but would like to propose a variation of the second (2): the name letting both Another proposal (maybe its already there, I didn't look at the code) is to also allow multiple CPU devices. There, the copying will be NOPs, but we can still run different cpu devices as different threads. How do things look in terms of synchronization in the current implementation? |
Very helpful comments!! I also prefer 2. / 2. then. And I think to support both copy like Currently, we don't support specifying CPU id. I will think about that in the near future. |
…device interface instead of do memcpy in executor.
The remark of |
@xunzhang Yes, |
@neubig Right, cool. I will finish this soon. |
This pull request is review-ready. It will not affect old code and interface and I will split remaining work in the future pull requests. The remaining things include,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is great! I have a bunch of small comments, but I think once they're resolved and I can confirm that this works in my environment, I think we can merge. Also, some of my comments might just be oversights, so if there's anything that you don't think needs to be fixed, just tell me.
dynet/exec.cc
Outdated
@@ -122,6 +122,10 @@ const Tensor& SimpleExecutionEngine::incremental_forward(VariableIndex i) { | |||
} | |||
node->aux_mem = aux_mem; | |||
|
|||
// check consistent device | |||
for (auto & xs_v : xs) { | |||
DYNET_ASSERT(xs_v->device == nfxs[num_nodes_evaluated].device, "Attemp to do tensor forward in different devices"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attemp -> Attempt. Similarly for all below.
dynet/exec.cc
Outdated
@@ -245,15 +254,17 @@ void BatchedExecutionEngine::combine_tensors(std::vector<VariableIndex> batch_id | |||
const size_t sz = node2size[id]; | |||
|
|||
float* my_src = batches[node2batch[id]].nfx.v + node2offset[id]; | |||
if (tout.device->type == DeviceType::CPU) { | |||
memcpy(dest, my_src, sz * sizeof(float)); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this "else", I think we should add a tout.device->type == DeviceType::GPU
, then throw an error if we get a device other than the one we're expecting. We might add other device types later, and if we do this logic will break (and it's in a somewhat inconspicuous place, so we should make sure that it's easy to catch through an error).
dynet/expr.cc
Outdated
@@ -44,15 +43,17 @@ Expression operator+(const Expression& x, const Expression& y) { | |||
else if (y.dim().batch_size() == 1) | |||
return Expression(x.pg, x.pg->add_function<ScalarAdd>({x.i, y.i})); | |||
else | |||
return Expression(x.pg, x.pg->add_function<Sum>({x.i, y.i})); | |||
return Expression(x.pg, x.pg->add_function<Sum>({x.i, y.i}, x.pg->nodes[x.i]->device)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need a "device" argument, right? It can just inherit its device from its inputs.
dynet/expr.cc
Outdated
@@ -76,7 +77,10 @@ Expression contract3d_1d(const Expression& x, const Expression& y, const Express | |||
Expression sqrt(const Expression& x) { return Expression(x.pg, x.pg->add_function<Sqrt>({x.i})); } | |||
Expression abs(const Expression& x) { return Expression(x.pg, x.pg->add_function<Abs>({x.i})); } | |||
Expression erf(const Expression& x) { return Expression(x.pg, x.pg->add_function<Erf>({x.i})); } | |||
Expression tanh(const Expression& x) { return Expression(x.pg, x.pg->add_function<Tanh>({x.i})); } | |||
Expression tanh(const Expression& x, Device *device) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, we should remove the device argument from this and all others below, except the ones that deal with input, or explicitly changing devices.
dynet/expr.h
Outdated
const bool is_stale() const {return (get_number_of_active_graphs() != 1 || graph_id != get_current_graph_id());} | ||
Expression() : pg(nullptr), i(0), graph_id(0) {} | ||
|
||
Expression(Device *device) : pg(nullptr), i(0), graph_id(0) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the empty constructor, so we should delete it.
dynet/nodes-linalg.h
Outdated
@@ -19,7 +21,7 @@ struct Transpose : public Node { | |||
// y = inv(x) | |||
// x = an invertible matrix | |||
struct MatrixInverse : public Node { | |||
explicit MatrixInverse(const std::initializer_list<VariableIndex>& a) : Node(a) {} | |||
explicit MatrixInverse(const std::initializer_list<VariableIndex>& a, Device *device) : Node(a, device) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no device here.
dynet/nodes-matrixmultiply.h
Outdated
|
||
namespace dynet { | ||
|
||
// y = x_1 * x_2 | ||
struct MatrixMultiply : public Node { | ||
explicit MatrixMultiply(const std::initializer_list<VariableIndex>& a) : Node(a) {} | ||
explicit MatrixMultiply(const std::initializer_list<VariableIndex>& a, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No device here.
dynet/nodes-trig.h
Outdated
|
||
namespace dynet { | ||
|
||
// y = tanh x_1 | ||
struct Tanh : public Node { | ||
explicit Tanh(const std::initializer_list<VariableIndex>& a) : Node(a) {} | ||
explicit Tanh(const std::initializer_list<VariableIndex>& a, Device *device) : Node(a, device) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No device here.
dynet/tensor.cc
Outdated
@@ -94,11 +95,13 @@ float TensorTools::access_element(const Tensor& v, int index) { | |||
} | |||
|
|||
float TensorTools::access_element(const Tensor& v, const Dim& index) { | |||
if (v.device->type == DeviceType::CPU) { | |||
return (*v)(index[0], index[1]); | |||
} else { | |||
#if HAVE_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, I think we should check if the device is GPU, and throw an error if we get an unknown device.
dynet/tensor.cc
Outdated
cudaMemcpyAsync(v.v, v_src.v, sizeof(real) * v.d.size(), cudaMemcpyDeviceToHost); | ||
#endif | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, check if device is GPU explicitly.
Thanks! I confirmed that this is working as expected, so I merged. This is great to have :) |
Is it working actually? I tried the "./examples/train_xor-multidevice" example and got the following error:
|
I think documentation is not finished yet, but you need to add |
It works. Thanks! |
--dynet-devices
argument.cg.change_expr_device
before defining an expression.V * tanh(affine_transform(b, W, x)) + a
.V * tanh(affine_transform(b, W, x)) + a
.Debug hang issue using multiple GPUsAdd more feature tests.Original Usage:
Modified Usage:
To reviewer @neubig , you can have a quick test using code below: