Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image caption network #1613

Closed
dylanliuli opened this issue Mar 14, 2017 · 6 comments
Closed

Image caption network #1613

dylanliuli opened this issue Mar 14, 2017 · 6 comments

Comments

@dylanliuli
Copy link

dylanliuli commented Mar 14, 2017

想实现 show and tell paper的网络结构,参考了tensorflow的code:https://github.com/tensorflow/models/tree/master/im2txt/im2txt,
但不是太明白如何在paddle中实现第0步输入图像特征,后续为one hot,调用哪个api?
此类benchmark 模型 paddle有sample code吗?

@helinwang
Copy link
Contributor

您把整个repo贴上来了,要是贴文件的某一行会比较方便大家理解您的意思。

我猜测一下,是说需要输入one hot vector sequence吗?请参考这个教程:http://book.paddlepaddle.org/machine_translation/

 src_word_id = paddle.layer.data(
     name='source_language_word',
     type=paddle.data_type.integer_value_sequence(source_dict_dim))

以上代码定义了one hot vector sequence的输入层(其实不是one hot vector的序列了,只需要传one hot vector index的序列,就是一个整形的序列)。代码里source_dict_dim代表序列中每个整型的范围[0, source_dict_dim)

@dylanliuli
Copy link
Author

dylanliuli commented Mar 20, 2017

谢谢 @helinwang
想问的是如何在paddle中实现第0步输入embedding 的图像特征(tensorflow 实现如下code 1),然后结合word embedding(tensorflow 实现如下code 2)
code 1:

    lstm_cell = tf.contrib.rnn.BasicLSTMCell(
        num_units=self.config.num_lstm_units, state_is_tuple=True)
    if self.mode == "train":
      lstm_cell = tf.contrib.rnn.DropoutWrapper(
          lstm_cell,
          input_keep_prob=self.config.lstm_dropout_keep_prob,
          output_keep_prob=self.config.lstm_dropout_keep_prob)

    with tf.variable_scope("lstm", initializer=self.initializer) as lstm_scope:
      # Feed the image embeddings to set the initial LSTM state.
      zero_state = lstm_cell.zero_state(
          batch_size=self.image_embeddings.get_shape()[0], dtype=tf.float32)
      _, initial_state = lstm_cell(self.image_embeddings, zero_state)

...
code 2:

# Run the batch of sequence embeddings through the LSTM.
       sequence_length = tf.reduce_sum(self.input_mask, 1)
       lstm_outputs, _ = tf.nn.dynamic_rnn(cell=lstm_cell,
                                           inputs=self.seq_embeddings,
                                           sequence_length=sequence_length,
                                           initial_state=initial_state,
                                           dtype=tf.float32,
                                           scope=lstm_scope)

归纳一下,就是想用Paddle实现这个网络结构:https://github.com/tensorflow/models/blob/master/im2txt/g3doc/show_and_tell_architecture.png

@dylanliuli
Copy link
Author

dylanliuli commented Mar 20, 2017

没有找到hack zero_state的方法,尝试用concat_layer的方式实现,
image

会报如下错误,尝试改为仅trg_word_embedding作为lstm输入(注释处)则不会出现,怀疑还是输入处理的不对,求教。

I0320 20:38:16.725687 18466 GradientMachine.cpp:86] Initing parameters..
I0320 20:38:20.448544 18466 GradientMachine.cpp:93] Init parameters done.
F0320 20:38:20.715349 18913 LstmLayer.cpp:155] Check failed: input.sequenceStartPositions
F0F0303200 20:38:0:38:715353.715351 18918  :155155er] Check failed: input.sequenceStartPositions
*** Check failure stack trace: ***
*** Check failure stack trace: ***
F0F0303200 20:38:0:38:715353.715351 18918  :155155er] Check failed: input.sequenceStartPositions
*** Check failure stack trace: ***
F0F0303200 20:38:0:38:715353.715351 18918  :155155er] Check failed: input.sequenceStartPositions F0F0F030320 202002320:3838:20..715670715668 00000 18914 LstmLayer.cppLstmLayer.cpp:155] ] Check failed: input.sequenceStartPositions

*** Check failure stack trace: ***
F0F0303200 20:38:0:38:715353.715351 18918  :155155er] Check failed: input.sequenceStartPositions F0F0F030320 202002320:3838:20..715670715668 00000 18914 LstmLayer.cppLstmLayer.cpp:155] ] Check failed: input.sequenceStartPositions

*** Check failure stack trace: ***
F0F0303200 20:38:0:38:715353.715351 18918  :155155er] Check failed: input.sequenceStartPositions F0F0F030320 202002320:3838:20..715670715668 00000 18914 LstmLayer.cppLstmLayer.cpp:155] ] Check failed: input.sequenceStartPositions

*** Check failure stack trace: ***

@luotao1
Copy link
Contributor

luotao1 commented Mar 21, 2017

Check failed: input.sequenceStartPositions

说明输入必须是一个序列数据。

@qingqing01
Copy link
Contributor

qingqing01 commented Mar 21, 2017

实现第0步输入embedding 的图像特征

数据可以使用 dense_vector_sequence 类型,数据格式可以参考: http://www.paddlepaddle.org/doc_cn/ui/data_provider/pydataprovider2.html#input-types , DataProvider里yield一条样本的数据格式为: [[f, ...], [f, ...], ...], 其中 [f, ...] 为一个时间步~

@livc
Copy link
Member

livc commented Jun 30, 2017

refer to #2641 pls.

@livc livc closed this as completed Jun 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants