Add design/api.md #1088

wangkuiyi · 2017-01-06T07:04:09Z

No description provided.

jacquesqiao · 2017-01-06T08:25:42Z

doc/design/api.md

+
+for mb, pass_end in rd.read():
+    gm.feed(mb)
+    ud.update(gm)


这里应该需要加上

gm.forward() gm.backward() gm.update()

哦。我这里忘了说明白——gm.feed调用了forward和 backward。我加上了一个说明。

gm.update是什么意思呢？是更新模型参数吗？我以为是 ud.update 来做这个事儿的。

确实应该是 ud.update(gm) 而不是 gm.update，我说的不对~

emailweixu · 2017-01-06T20:07:47Z

doc/design/api.md

+input_dis = paddle.layer.input(...)
+hidden_dis = paddle.layer.fc(intput_dis, ...)
+output_dis = paddle.layer.softmax(hidden_dis, ...)
+


For GAN, gm_dis and gm_gen update different subset of parameters. We need to have a mechanism to specify this. In the current GAN example, this is achieved by setting is_static according to is_discriminator_training. This is possible in the current GAN example because the configs of gm_gen and gm_dis are actually generated differently.

我理解有这个需求。我这个设计里是这样实现的（但是可能文字里没有说清楚）：

每个“部分”描述成一个“网络”

每个“网络”用其output指代。

比如例子里有 output_dis 和 output_gen。这可以通过在每个layer中记录其所有input layers来实现，这样给定一个output layer，我们即可trace到整个网络中的所有layers。

每个“网络”的更新信息维护在一个 gradient machine。

例子中，有两个网络，但是却有三个gradient machines：gm_dis 对应 output_dis，gm_gen 对应 output_gen，gm_comp 对应 output_gen 和 output_dis 的组合。

对每个网络的更新，是通过 updater 来实现的。

updater的输入是 gradient machine，因为通过每次forward/backward 调用计算得到的 layer outputs (activations) 和 gradients 都保存在 gradient machine 里了，而且通过 gradient machine 可以查到其对应的network的信息。

在下面例子里：

ud.update(gm_dis)

利用 gm_dis 更新其对应的 output_dis 网络，其中没有显示指定需要被更新的部分，所以是更新gm_dis 对应的整个网络；而

ud.update(gm_comp, output_gen)

是利用 gm_comp 更新其对应的网络中的 output_gen 子网络。其中 output_gen 是显示指定的需要被更新的部分。

helinwang · 2017-01-06T20:50:41Z

doc/design/api.md

+
+### Model
+
+For deep learning, a model includes two parts: the topology and parameters.  Currently, the concept *model* in Paddle contains only the topology, and parameters are in another concept *gradient machine*.  This differs from the intuition and makes it difficult to save/load models.  In this design, we should keep both topology and parameters in a *model*.


Here is how tensorflow do it, just for reference:

Tensorflow has a concept of graph (topology). Load graph: https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go#Graph.Import

Save and load weights is an Operation on graph. Save and load in golang: https://gist.github.com/helinwang/7782c6b2815c334c77653fc0e52b6069

这一点我不是特别理解，为什么 graph (topology) 和 weights 要分开呢？按照“model是graph加上parameterss（weights）”的概念，貌似把两者放在一个 class Model 里更自然呀？

helinwang · 2017-01-06T21:00:09Z

doc/design/api.md

+rd = paddle.data.amazon.product_review.new()
+mt = paddle.metric.new_acc()
+
+for mb, pass_end in rd.read():


Maybe need a way to specify minibatch size, either in reader.new(int batch_size) or reader.read(int batch_size)?

确实。可以通过read函数指定mini batch size，也可以在创建reader的时候指定，比如

rd = paddle.data.amazon.product_review.new(minibatch_size=100)

helinwang · 2017-01-06T22:26:28Z

doc/design/api.md

+    fake_label = paddle.input.new(False, ...)
+    real_label = paddle.input.new(True, ...)
+    gm_dis.feed([fake, fake_label])
+    gm_dis.feed([mb, real_label])


Maybe this question is naive, will the computed gradient from second feed override the gradients from first feed?

这个设计里是这么假设的：第二次 feed 调用产生的 layer outputs 会覆盖之前一次 feed调用产生的；第二次 feed 调用产生的gradients 也会覆盖之前一次调用产生的。

helinwang · 2017-01-06T22:29:20Z

doc/design/api.md

+    gm_dis.feed([mb, real_label])
+    ud.update(gm_dis)
+
+    gm_comp.feed([mb, real_label])


Since updater will only need to update gradients of output_gen's predecessors'. One possible optimization here is to let gradient machine know this information. So it does not need to save any unrelated activation and gradients. E.g.,

gm_comp.feed([mb, real_label], output_gen) ud.update(gm_comp, output_gen)

Maybe I don't have enough context, but from a first glance, I feel maybe it's easier that gradient machine be stateless (current design saves gradients inside gradient machine). E.g.,

// pseudo code type gradientMachine type gradient type updater var gm_comp gradientMachine var ud updater var gradient g = gm_comp.feed([mb, real_label], output_gen) ud.update(g)

这里可能是我没有说清楚，所以引起误会了。gm_comp 这个 gradient machine 对应的是 gen 和 dis 两个网络的组合。这个组合的输入是 gen 的输入，所以 input 只能是 mb。

此外，gradient machine被设计成有状态是intentionally的，因为在GAN的例子里，我们有两个“网络”，dis 和 gen，但是却需要记录三种 activation+gradients：dis, gen, comp。

helinwang · 2017-01-06T22:40:53Z

doc/design/api.md

+    ud.update(gm_comp, output_gen) # updates only the model whose output layer is output_gen.
+```
+
+A key point here is that we use the output layer to indicate a model.  I think that we can achieve this as long as each layer knows about its predecessors so that we can trace from the output layer upward till the input layers.  Please be aware that we didn't compose two models in above example code; instead, we only created a gradient machine which covers both `model_gen` and `model_dis`.


Just for reference: If I understand correctly, tensorflow do it by state what sub-graph needs update

output_gen_min = tf.train.AdamOptimizer(1e-2).minimize(output_gen) output_gen_min.run(feed_dict={x: input, y_: label})

Another use case is we only want to update some weights inside a subgraph.
E.g., we want to update fc layer weights of ouput_gen but not weights of hidden_gen (predecessors of ouput_gen). (I think people call it fine tune)

Tensorflow allow explicitly state weights that needs update.

fine_tune_step = tf.train.AdamOptimizer(1e-2).minimize(cross_entropy, var_list=[weights_of_output_gen]) fine_tune_step.run(feed_dict={x: input, y_: label})

理解了。在这个设计里，也有类似的描述：

ud.update(gm_dis)

利用 gm_dis 更新其对应的 output_dis 网络，其中没有显示指定需要被更新的部分，所以是更新gm_dis 对应的整个网络；而

ud.update(gm_comp, output_gen)

是利用 gm_comp 更新其对应的网络中的 output_gen 子网络。其中 output_gen 是显示指定的需要被更新的部分。

reyoung · 2017-01-06T08:02:44Z

doc/design/api.md

+
+   1. *updater*, which encapsulates the updating algorithm.
+
+   It seems that *cost function* is a property of *gradient machine*?


The cost function should be a property of network topology.

cost function和模型无关，只和训练方法有关，不应该是网络的一部分。

比如一个generative model，可以用朴素的“输出和输入相同”的criteria来训练，也可以用GAN criteria来训练，cost不同，但是网络相同。类似的，sequence to sequence model，可以用最小误差作为cost，也可以用CTC作为cost。

reyoung · 2017-01-06T08:35:56Z

doc/design/api.md

+
+1. *Data*
+
+   Models are trained using data sets.  We hope to provide a set of utility data sets encapsulated in Python packages like `paddle.data.amazon.product_review` and `paddle.data.wikipedia.articles`.  A reasonable idea might be that in each of these packages, we provide a `new` function that returns a reader object or a Python iterator.  And the *reader* has a read method `read`, which, once called, returns a minibatch and a flag indicating if it reaches the end of a data set.  For online learning, the flag would always be False.  Therefore, a training loop might look like:


Returning a standard python iterator is a good idea. It is no need to give a read method. The flag of the end of the data set is not necessary because standard python iterator will throw a StopIteration exception.

API的设计应该和语言无关。应该使用各种语言都有的功能，而不专门依赖Python语法。

python iterator只是一种类的调用约定。本质上是一个对象，但是这个对象实现了 next 方法。调用next的时候，返回一组数据。当iterator空了的时候，throw一个StopIteration的exception。
这和reader概念类似，next函数也就相当于read函数。Python的iterator就相当于java里面的List interface一样自然。

如果我们封装的是 "Python"的api，在不增加用户心智负担的情况下，接口还是尽量按照"Python"的逻辑实现吧。

reyoung · 2017-01-06T08:42:29Z

doc/design/api.md

+### A Simple Network
+
+```python
+gm = paddle.gradient_machine.new(model)  # gm uses default cost function.


There is one thing we didn't discuss yet. How does user describe a neural network?

The most convenient way to implementation is to use a Python method for a network. For example

@network(input={'x': dense_vector(123), 'label': integer_value(10)}) def sample_network(x, label): hidden = fc_layer(input=x, size=100) prediction = fc_layer(input=hidden, size=label.size, act=SoftmaxActivation()) return classification_cost(input=prediction, label=label) # Create the model. model = sample_network() model.randomParams() model.save/load() ...

Another way to define neural network topology is passing the final value to a model creator. For example

x = data_layer(type=dense_vector(123)) hidden = fc_layer(input=x, size=100) ... loss = classification_cost(input=prediction, label=label) model = paddle.ModelCreator(loss) ...

It is hard to implement that in Paddle.

在我写的正文里，没有强调，但是在我给徐老师和鹤麟的回复里有强调：

没有model这个概念了等价于network。所以应该既没有model，也没有model creator的概念才对。

每个network用它的output layer指代。具体请参见 Add design/api.md #1088 (comment)

另外，我理解API的设计思想最好不要用Python来描述，容易依赖到Python独特的语法，比如 @network 这样的decorator。API 应该是用“带对象的C”描述也能清晰的，才能支持各种 client languages。

同意， decorator给用户带来了极大的困扰

reyoung · 2017-01-09T07:05:49Z

目前的分歧包括

神经网络模型是不是包括loss function?

一方认为神经网络模型不包括loss function。原因

模型预测的时候其实是不用loss function的。比如分类问题，可以用交叉熵，也可以用huber loss之类的其他loss。但是模型预测并不需要知道这些loss是什么。
训练的时候，直接用 model.fit(output, label, cost="cross-entropy")

另一方认为神经网络模型应该加上loss function。原因

有一些模型的loss function特别复杂，如果分开写的话很困难
有一些无监督学习，loss function并不能在最后指定，也是模型的一部分。
- Neural Art模型，中的content_loss和style_loss是两组网络的对比。如果在最后指定的话，可能比较复杂。 https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L108

Paddle模型配置应该怎么定义?

函数

def network(pixel):
  hidden = fc_layer(input=pixel, size=200)
  pred = fc_layer(input=hidden, size=10, act=SoftmaxActivation())
  return pred

model = create_model(network, input={'pixel': dense_vector(784)})

网络的最后一个变量

pixel = data_layer(name='pixel', type=dense_vector(784))
hidden = fc_layer(input=pixel, size=200)
pred = fc_layer(input=hidden, size=10, act=SoftmaxActivation())  # pred store all neural network topology

model = create_model(pred)

问题是，第二种变成变量很难实现，因为那样基本上要把Paddle目前的网络解析重构一遍了。

网络的表示是否提供了足够灵活性，能描述GAN这样的问题

分歧是，GAN这种网络，是需要『网络描述』时的灵活性，还是训练时的灵活性？GAN基本问题是在更新的时候选择更新参数。这其实是updater的update函数，需要在更新的时候，指定一个Predicate 吧。与网络描述似乎无关。

* update * update * remove community/junnyu * Update test_modeling.py * suggestion from ZHUI * add community/junnyu * rm gpt link * update * update readme * add large medium * 更新权重个数 * update gpt compare * update docs * add msra ner example Co-authored-by: yingyibiao <yyb0576@163.com> Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>

Add design/api.md

cf7de6f

wangkuiyi requested review from reyoung, jacquesqiao, qingqing01, hedaoyuan and wangyang59 January 6, 2017 07:04

jacquesqiao reviewed Jan 6, 2017

View reviewed changes

wangkuiyi requested a review from emailweixu January 6, 2017 19:38

Point out GradientMahchien.feed calls forward and backward

ae1d2bd

emailweixu reviewed Jan 6, 2017

View reviewed changes

helinwang reviewed Jan 6, 2017

View reviewed changes

reyoung reviewed Jan 7, 2017

View reviewed changes

This was referenced Jan 22, 2017

python config parser重构之optimizer #1210

Closed

V2 optimizer #1214

Closed

wangkuiyi closed this Feb 13, 2017

wangkuiyi deleted the api.md branch February 13, 2017 04:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add design/api.md #1088

Add design/api.md #1088

wangkuiyi commented Jan 6, 2017

jacquesqiao Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 6, 2017 •

edited

Loading

jacquesqiao Jan 9, 2017

emailweixu Jan 6, 2017

wangkuiyi Jan 8, 2017 •

edited

Loading

helinwang Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 8, 2017 •

edited

Loading

helinwang Jan 6, 2017

wangkuiyi Jan 8, 2017

helinwang Jan 6, 2017

wangkuiyi Jan 8, 2017

helinwang Jan 6, 2017

wangkuiyi Jan 8, 2017 •

edited

Loading

helinwang Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 8, 2017

reyoung Jan 6, 2017

wangkuiyi Jan 9, 2017 •

edited

Loading

reyoung Jan 6, 2017

wangkuiyi Jan 9, 2017

reyoung Jan 9, 2017 •

edited

Loading

reyoung Jan 6, 2017

wangkuiyi Jan 9, 2017 •

edited

Loading

OleNet May 24, 2017

reyoung commented Jan 9, 2017 •

edited

Loading


		### Model

		For deep learning, a model includes two parts: the topology and parameters. Currently, the concept model in Paddle contains only the topology, and parameters are in another concept gradient machine. This differs from the intuition and makes it difficult to save/load models. In this design, we should keep both topology and parameters in a model.


		1. updater, which encapsulates the updating algorithm.

		It seems that cost function is a property of gradient machine?


		1. Data

		Models are trained using data sets. We hope to provide a set of utility data sets encapsulated in Python packages like `paddle.data.amazon.product_review` and `paddle.data.wikipedia.articles`. A reasonable idea might be that in each of these packages, we provide a `new` function that returns a reader object or a Python iterator. And the reader has a read method `read`, which, once called, returns a minibatch and a flag indicating if it reaches the end of a data set. For online learning, the flag would always be False. Therefore, a training loop might look like:

Add design/api.md #1088

Add design/api.md #1088

Conversation

wangkuiyi commented Jan 6, 2017

jacquesqiao Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

wangkuiyi Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi Jan 8, 2017 • edited Loading

Choose a reason for hiding this comment

helinwang Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

wangkuiyi Jan 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi Jan 8, 2017 • edited Loading

Choose a reason for hiding this comment

helinwang Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung commented Jan 9, 2017 • edited Loading

神经网络模型是不是包括loss function?

Paddle模型配置应该怎么定义?

网络的表示是否提供了足够灵活性，能描述GAN这样的问题

jacquesqiao Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 8, 2017 •

edited

Loading

helinwang Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 8, 2017 •

edited

Loading

wangkuiyi Jan 8, 2017 •

edited

Loading

helinwang Jan 6, 2017 •

edited

Loading

wangkuiyi Jan 9, 2017 •

edited

Loading

reyoung Jan 9, 2017 •

edited

Loading

wangkuiyi Jan 9, 2017 •

edited

Loading

reyoung commented Jan 9, 2017 •

edited

Loading