Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python v2 API to new operator framework #3129

Closed
wangkuiyi opened this issue Jul 31, 2017 · 14 comments
Closed

Python v2 API to new operator framework #3129

wangkuiyi opened this issue Jul 31, 2017 · 14 comments
Assignees

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Jul 31, 2017

Design Doc: RNNOp

A Plain Network

predict = paddle.layer.fc(
      paddle.layer.data(name="x"),
      output_size = 100)
      
cost = paddle.layer.mse(
    predict,
    paddle.layer.data(name="y"))
    
parameters = paddle.train(cost)
paddle.save_model(cost, paddle.datasets.mnist.train(), parameters, "filename");

p = paddle.load_model(predict, "filename")
paddle.infer(predict, 

Layers, Variables, and Default Scope

# in package paddle.layer
def data(name):
  return paddle.cpp.variable(paddle.cpp.default_scope(), name)

def fc(input, output_size):
  output = paddle.cpp.variable(paddle.cpp.default_scope())
  W = paddle.cpp.variable(paddle.cpp.default_scope(), label="parameter")
  b = paddle.cpp.variable(paddle.cpp.default_scope(), label="parameter")
  paddle.cpp.operator("FC", read={input, W, b}, output_size, write={output})
  return output
  
def mse(input1, input2):
  output = paddle.cpp.variable(paddle.cpp.default_scope())
  paddle.cpp.operator("MSE", read={input1, input2}, write={output})
  return output

where

  • paddle.cpp.variable is a Python binding of C++ method Scope::NewVar().
  • paddle.cpp.operator creates an operator and mark it as a reader of some variable and a writer of some others. We will cover this later in more details.

paddle::operator::Net

paddle.train receives a variable created by paddle.layer.mse and need to trace all related operators and sort them by the topological order.

Please be aware that all operators are derived from class OperatorBase, which refers to Variables by their names:

class OperatorBase:
  vector<string> inputs_;
  vector<string> outputs_;
};

and Variables doesn't have names if they are not in a Scope.

Also, each Varaible maintains:

class Variable {
  list<Operator*> readers_;
  list<Operator*> writers_;
};

Please be aware the trace from an operator to its input variables depends on the default scope. The tracing is done in C++ space, so paddle.cpp.default_scope is a binding to C++ code.

class Net {
 public:
  static Net* TraceAndBuild(Variable* output, Scope* scope) {
    std::list<std::pair<Operaor*, int/*distance to output*/> > dists;
    std::list<std::pair<Variable*, int /*distance to output*/> > frontier;
    frontier.push_back(make_pair<output, 0>);

    while (frontier.size() > 0) {
      Variable* v = frontier.front().first;
      int dist = frontier.front().second;
      frontier.pop_front();
      
      for (Operator* o : v->writers_) {
        dists.push_back(make_pair(v, dist));
        for (const string& s : o->writers_) {
          frontier.push_back(make_pair(scope->FindVar(s), dist+1));
        }
      }
    }
    
    std::sort(dists, /*by the descending order of dist*/);
    
    return new Net(dists); 
 }
};

We can call

Net::TraceAndBuild(output_variable, DefaultScope()).Run(DefaultScope());

to extract the network using the default scope and run it.

Scope Hierarchy

An RNN operator may have kinds of variables:

  1. global variable -- in outer scope
  2. memory variable -- in RNNOp-local scope
  3. local variable -- in step-local scope
   outer scope
      /|\
       |
   RNNOp scope 
(the memory over steps)
  /|\  /|\   /|\
   |    |     |
step-0 step-1 step-2
scope  scope  scope

Just like what a programing language compiler/interpreter would do, for each step, there is a step-local scope, but there is only one copy of compiled code (binary code) or step-net in our case.

Above three tiers can be simplified to two-tier by moving memory variables to the outer scope, but this is not necessary.

outer scope (including all memory variables of an RNNOp)
  /|\  /|\   /|\
   |    |     |
step-0 step-1 step-2
scope  scope  scope

A Recurrent Network

x = paddle.layer.data(name="features")
y = paddle.layer.data(name="labels")

accum = paddle.framework.tensor()

cost = paddle.layer.mse(
  paddle.layer.fc(
    paddle.layer.rnn(
      input = paddle.layer.fc(x),
      step_net = paddle.layer.fc(
                   paddle.layer.add_to(accum, NULL), 
                   output_size=100),
      concat_output=true)),
  y)

paddle.train(cost, ...)

Here we use NULL as the placeholder of the input of the step net.

Please notice that we don't have to consume the output of an RNNOp. For example, we can use the memory as the RNNOp's output:

x = paddle.layer.data(name="features")
y = paddle.layer.data(name="labels")

memory = paddle.framework.tensor()

paddle.layer.rnn(
  input = paddle.layer.fc(x),
  step_net = paddle.layer.fc(
               paddle.layer.add_to(memory, NULL), 
               output_size=100),
  concat_output=true)

cost = paddle.layer.mse(paddle.layer.fc(memory), y)

paddle.train(cost, ...)

Step-Net

Above example shows that the step_net parameter of paddle.layer.rnn accepts a variable returned by paddle.layer.fc. We need to trace the step-net from this variable. This can be done by calling the aforementined paddle::operator::Net::TraceAndBuild

namespace paddle { 
namespace operator {
class RNN {
 public:
  void Run(Scope* scope) {
    RNNInput* whole_input = inputs_[0]->Get<RNNInput>();
    int sequence_len = whole_input->Len(0);
    for (int i = 0; i < sequence_len; ++i) {
      Scope* step_scope = scope->NewScope();
      step_scope->NewVar("step_input")->GetMutable<Tensor>()->Slice(whole_input, i);
      Net* net = Net::TraceAndBuild(GetAttr<Variable*>("step_net"), step_scope);
      net->Run(step_scope)
    }
  }
};
@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Aug 1, 2017

How to serialize/deserialize network topology -- @reyoung

It seems that we can write a function Serialize to serialize the network from TraceAndBuild -- @wangkuiyi

@wangkuiyi
Copy link
Collaborator Author

Make sure that the traced-and-built network in this design supports graph visualization -- @zchen0211

I am sure it supports. -- @wangkuiyi

@wangkuiyi
Copy link
Collaborator Author

Change RNN step-net input placeholder from NULL to paddle.cpp.variable(name=a_special_name). -- @reyoung

@wangkuiyi
Copy link
Collaborator Author

The parameter concat_output of paddle.layer.rnn could be an boolean array if the step-net have multiple outputs. -- @Superjom

@wangkuiyi
Copy link
Collaborator Author

To support memory in RNN, we might need Placeholder as a special type. -- @zchen0211 @reyoung

@wangkuiyi
Copy link
Collaborator Author

Need to consider the initialization of parameters. -- @wangkuiyi

@wangkuiyi
Copy link
Collaborator Author

Need to consider parameter sharing and enable/disable parameter update, so could we implement GAN. -- @wangkuiyi

@reyoung
Copy link
Collaborator

reyoung commented Aug 1, 2017

@wangkuiyi

I think make fc_layer, data_layer return a Variable, and use Variable::reader/writer to TraceAndBuild a NetOp is a little bit complex. We could have a simpler implementation to that SAME API by returning a Python object, Expression.

struct OperatorBase {
  vector<string> inputs_;
  vector<string> outputs_;
};

struct Expression {
  // All operators to calculate that expression. So the operator could be a NetOp.
  OperatorBase* operator;
  size_t output_index;
}

Each function in Python should return an Expression like programming languages(https://en.wikipedia.org/wiki/Expression_(computer_science)). The layer function could be implemented like this.

def data_layer(data_type, reader):
  op = paddle.cpp.new_op("data", reader=reader, data_type=data_type)
  return Expression(op, 0)

def fc_layer(input, size, ...):
  w = paddle.cpp.new_var("w", ...)
  b = paddle.cpp.new_var("b", ...)
  out = paddle.cpp.new_var("out", ...)
  net = paddle.cpp.new_net()
  net.add_op(input.op)  # prepend previous operators here. 

  op = paddle.cpp.new_op("fc", input=[
        input.op.outputs[input.op.output_index],  # could extract a helper function here.
        w,
        b,
    ], output=[out])
  net.add_op(op)

  return Expression(net, 0)

When getting the whole network of an operator, it could just

data = data_layer(...)
hidden = fc_layer(data, ...)  # hidden.operator contains data->fc
cost = fc_layer(hidden, ...)  # cost.operator contains data->fc->cost

cost.operator  # cost's network.

Also, we could add some helper function into Expression, like

class Expression {
public:
  Variable* calculate_var(const Scope& scope) {
    operator->Run(scope);
    return to_var(scope);
  }

  Variable* to_var(const Scope& scope) {
    return scope->find_var(operator->outputs[output_index]);
  }

  OperatorBase* operator;
  size_t output_index;
};

@reyoung
Copy link
Collaborator

reyoung commented Aug 1, 2017

@wangkuiyi Says to introduce an Expression concept is not necessary.

@reyoung thinks that we could use Operator as Layer's returns. Not new concept Expression.

@wangkuiyi
Copy link
Collaborator Author

@reyoung This design in #3129 (comment) introduces class Expression in addition to Variable and Operator.

Also, it requires those researchers who program layers to provide connection information; instead of using TraceAndBuild to hide the complexity from users.

For above two reasons, I don't think it is better than #3129 (comment).

@Superjomn
Copy link
Contributor

may be Expression can be hidden from users. It is an implementation detail.

The user views an operator's inputs and outputs as a concept of arguments, so we can wrap the details and give a corresponding concept.

@reyoung @wangkuiyi

@wangkuiyi
Copy link
Collaborator Author

It would be great if we can have a simple enough way to hide the details from users. Let us write the code so to present a comparison. @Superjom

@Superjomn
Copy link
Contributor

Both the TraceAndBuild and Expression methods need to add code in C++, adds new concepts or modify current data structure.

The python-wrapper is much cheap to modify, so I tried to move the codes above from c++ to python, and try to prove that

  • this change leaves no harm to user's learning cost, because the API acts exactly like what they want, and hide detail c++ concepts from them
    • it is free to add wrapper in python to make the core concepts more user-friendly
    • Variable, Operator, Scope will have their own python wrappers above the details of Pybind, and its easier to improve user-friendly in these wrappers.
  • support subnet(sub graph) executation like MXNet's Symbol and the Expression method, if we want to execute a var-related subnet, backtrace it and store the subnet in this Variable(python wrapper) as a cache.
  • the net can be serialized by Python and support C SDK, each Variable(python wrapper) will store its subnet in a NetOp, just serialize the NetOp.

If it is not a good idea to move those codes from C++ to Python, ignore following codes ...

I implement BuildAndTrace method, but Expression method is similar, use Python wrappers, free to add details, and users will feel nothing new.
@wangkuiyi @reyoung

# original Variable
import paddle.cpp.Variable as CPPVar

class Variable:
    '''
    A wrapper for Variable defined in CPP, and leave net trace
    or expression in python.
    '''
    def __init__(self, cpp_var):
        '''
        @cpp_var: CPPVar
        '''
        self._cpp_var = cpp_var
        self._read_ops = set()
        self._writs_ops = set()

        # a net op which stores the subnets.
        self.subnet = None

    def cpp_var(self):
        return self._cpp_var

    def add_read_op(self, cpp_op):
        self._read_ops.add(cpp_op)

    def add_write_op(self, cpp_op):
        self._write_ops.add(cpp_op)

    def __repr__(self):
        '''
        acts exactly like a Variable
        '''
        return repr(self._cpp_var)

class Scope:
    def __init__(self):
        self._cpp_scope = paddle.cpp.new_scope()
        # store Variable map
        self._var_map = {}

    def new_var(self):
        cpp_var = self._cpp_scope.new_var()
        v = Variable(cpp_var)
        self._var_map[v.name] = v

    def get_var(self, label):
        return self._var_map[label]

def new_op(op_type, reads, writes, **attrs):
    '''
    detect variable dependencies.
    '''
    cpp_op = paddle.cpp.operator(op_type, inputs=[v.cpp_var() for v in reads],
                        outputs=[v.cpp_var() for v in writes], **attrs)
    for v in reads:
        v.add_read_op(cpp_op)
    for v in writes:
        v.add_write_op(cpp_op)
    return cpp_op

default_scope = Scope()

def fc(input):
    '''
    @input: Variable
    @output: Variable
    '''
    output = default_scope.new_var()
    W = default_scope.get_var(label)
    b = default_scope.get_var(label)
    new_op("FC", reads=[input, W, b], output_size, writes=[output])
    return output


def backtrace(var):
    '''
    the implementation by @wangyi in c++ before, we can move this to python.
    batcktrace variable and create a subnet(NetOp).
    '''
    # much details here
    pass


def paddle_run(var):
    '''
    @var: Variable

    run a subnet whose end point is var, if this network contains cost var,
    then backward will be called, otherwise just forward.
    '''
    if not var.subnet:
        var.subnet = backtrace(var)
    var.subnet.Run()


def serialize(var):
    '''
    @var: Variable
    '''
    if not var.subnet:
        var.subnet = backtrace(var)
    return var.subnet.Serialize()

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Aug 1, 2017

@Superjom If I understand this correctly, it would be much much less code to add readers_ and writers_ to C++ class Variable, and no bother to add Python classes Variable and Scope.

Let us focus on more important topics like parameter sharing -- those not yet fixed in this design. It doesn't look like we can really have an easier solution to tracing the topology than the first proposal. However, the first proposal is not verified to work with model sharing yet.

@wangkuiyi wangkuiyi changed the title RNNOp and its Python API Python v2 API to new operator framework Aug 2, 2017
@wangkuiyi wangkuiyi self-assigned this Aug 2, 2017
@reyoung reyoung mentioned this issue Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants