Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Fluid -- Function Definition and Invocation #10244

Closed
wangkuiyi opened this issue Apr 26, 2018 · 5 comments
Closed

Complete Fluid -- Function Definition and Invocation #10244

wangkuiyi opened this issue Apr 26, 2018 · 5 comments
Assignees

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Apr 26, 2018

The Requirement of Functions

Motivation 1. Inference Engine Calls Fluid Functions

The inference/production system are not supposed to be created by PaddlePaddle team, and not even PaddlePaddle users. They are often server programs in C++/Java/Go, or mobile inference engines in C/Objective-C. PaddlePaddle project should provide a library, which allows

  1. The users can use whatever host language to create the production system, and
  2. they can call Fluid functions that describe the inference algorithms from their host languages.

Motivation 2. Canonicalize IfElse/While Implementations (arguably)

The current implementation of IfElseOp and WhileOp is kind of hacky -- values passing in/out the blocks are in a hacky way, involving some hacky concepts like InLink and OutLink, which should have been arguments and return values and should have been implemented using FuncCallOp, which implements a canonical calling convention which will be explained in the rest of this document.

Technical Viability

An argument is the Fluid is a differenetial langauge, but if the FuncCallOP differentable?

I think the answer is Yes, because the function call is basically insert the sequence of operations of a block (the callee's body) into the caller's body. As long as the sequential execution is differentiable, the function call should be differentiable -- with the help of scope-hierarchy.

Function Definition

A Function

A function is composed of the following staff:

  1. function signature
    1. function name
    2. inputs, or arguments,
    3. outputs, or return values
  2. function body

Currently, Fluid has the framework.BlockDesc message that can be used to represent the function body, but not the signature.

A Program

A program (imagine a C program) is composed of one or more functions, where one of them (with the name main) is the default entry-point, which might call other functions.

Other functions might be entry-points too -- consider a C program that is built into an .so file. In the case of Fluid, we need similar feature, e.g., to allow a C++ function (in the inference engine program) to call a Fluid function (which implements the inference algorithm).

Currently, a Fluid program (a ProgramDesc message) cannot have function definitions, so instead, it looks something like a Python script, or a block-hierarchy, where the root block is the entry-point of the "Python script".

@helinwang
Copy link
Contributor

helinwang commented Apr 26, 2018

Agree. I want to point out that the new API design is fully compatible with the solution of this issue:

class Word2Vec(fluid.Program):
  @network("firstw", "secondw", "thirdw", "forthw", "nextw")
  def train_step(self):
    # ...
  @network("firstw", "secondw", "thirdw", "forthw")
  def infer(self):
    # ...

word2vec = Word2Vec().Compile()
avg_cost = word2vec.train_step(data[0], data[1], data[2], data[3], data[4])
next_word = word2vec.infer(1,2,3,4)

The Compile function in word2vec = Word2Vec().Compile() can put the method name train_step and infer, as well as their arguments into the program desc.

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Apr 26, 2018

Fluid Programs and Functions

Basing on the following facts:

  • A program is a collection of function definitions, where one of them is the default entry-point.
  • Each function has a signature and a body (block).
  • Each block includes some
    • operator calls, which are like invocations of built-in or standard functions,
    • Fluid function calls, which are like invocations of user-defined functions.
      These function calls can access
    • local variables
    • function parameters and return values (if the block is a function body)
    • variables defined in parent blocks.

An Introductory But Not-Good-Enough Approach

An intuitive reaction to this is to describe functions by adding a FunctionDesc mesage, which includes a BlockDesc that describes the function body.

  1. change our ProgramDesc from defining a hierarchy of blocks to a collection of functions:

    message ProgramDesc {
      required Block global = 1;  // Where global variables and their initializing operators reside.
      repeated FunctionDesc functions = 2; // the first function is the entry-point, 
                                           // or the main function.
    }
  2. add FunctionDesc and move the hierarchy of blocks from ProgramDesc into FunctionDesc:

    message FunctionDesc {
      message Parameter {
        int var_idx = 1;
        string name = 2;
      }
      required string name = 1;
      repeated Parameter args = 2; // The args[i].var_idx-th variable in function body Block 
                                   // is the i-th parameter.
      repeated int num_rets = 3; // Helps the FuncCallOp to setup that the rets[i]-th 
                             // variable in the block of the caller's body saves the i-th
                             // return value of this function.
      repeated BlockDesc blocks = 4;
    }

Please be aware that:

  1. Each parameter of a function is an index referring to a variable defined in the function's body block.
  2. Each return value of a function is an index referring to a variable defined in the caller function's body block.

The problem with this approach is that each function definition (body) has a hierarchy of blocks, and blocks in different hierarchies (functions) do not overlap, which implies that the global block, which is necessary for holding global variables are not the parent block of all body blocks. This would prevents other blocks from access global variables.

An Alternative Approach with Unique Block Hierarchy

To make sure that all blocks are in a unique hierarchy, we need function definitions refering to blocks as their bodies.

  1. Add functions to ProgramDesc:

    message ProgramDesc {
      repeated BlockDesc blocks = 1; // All blocks are in the unique hierachy.
      repeated FunctionDesc functions = 1; // Function definitons refer to blocks in the hierarchy.
    }
  2. Make FunctionDesc refers to blocks in the unique hierarchy:

    message FunctionDesc {
      required string name = 1;
      message Parameter {
        int var_idx = 1;
        string name = 2;
      }
      required string name = 1;
      repeated Parameter args = 2; // The args[i].var_idx-th variable in function body Block 
                                   // is the i-th parameter.
      repeated int num_rets = 3; // Helps the FuncCallOp to setup that the rets[i]-th 
                             // variable in the block of the caller's body saves the i-th
                             // return value of this function.
      repeated int blocks = 4; // Indices to blocks defined in ProgramDesc.
    }

Restrictions

This proposal takes C as a reference, thus is a minimalist proposal; it

  1. doesn't allow nested function definition,
  2. doesn't allow first-class function type,
  3. doesn't allow lambda (construct a function).

@helinwang
Copy link
Contributor

helinwang commented Apr 26, 2018

Agree. and

Inline Function Compilation

In C++, there is a keyword inline, which direct the compiler to include the callee function's body into the caller function's body. Basing on the above new ProgramDesc, we can do such inline function compilation using the host language's (Python's) function call mechanism.

To make the explanation comprehensive, let us imagine that users write three Python functions

class Word2Vec(fluid.Program):
  def _predict():
    # ...
  @fluid("firstw", "secondw", "thirdw", "forthw", "nextw")
  def train_step(self):
    predict = self._predict()
    label = ...
    loss = fluid.mse(predict, label)
    # minimize loss
  @fluid("firstw", "secondw", "thirdw", "forthw")
  def infer(self):
    return self._predict()

where both infer and train_step call _predict.

The Python function decorator @fluid is a marker that states that only the decorated function is going to be compiled into a FunctionDesc.

Because the compilation process is exactly the execution of the above Python program, while infer calls _predict, all statements in these two functions go into the same FunctionDesc message. Similarly, statements in train_step and _infer go into a second FunctionDesc message.

@putcn
Copy link
Contributor

putcn commented Apr 26, 2018

Each return value of a function is an index referring to a variable defined in the caller function's body block.

does that mean function caller is tightly bind with function definition?

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants