Skip to content

Commit

Permalink
Further update README.md and add function doc for mt_with_external_me…
Browse files Browse the repository at this point in the history
…mory model.
  • Loading branch information
xinghai-sun committed Jul 13, 2017
1 parent 617bc4e commit 69b4bda
Show file tree
Hide file tree
Showing 3 changed files with 236 additions and 34 deletions.
190 changes: 174 additions & 16 deletions mt_with_external_memory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ class ExternalMemory(object):
- 输入参数 `read_key`:某层的输出,其包含的信息用于读头的寻址。
- 返回:读出的信息(可直接作为其他层的输入)。

部分重要的实现逻辑
部分关键实现逻辑

- 神经图灵机的 “外部存储矩阵” 采用 `Paddle.layer.memory`实现,并采用序列形式(`is_seq=True`),该序列的长度表示记忆槽的数量,序列的 `size` 表示记忆槽(向量)的大小。该序列依赖一个外部层作为初始化, 其记忆槽的数量取决于该层输出序列的长度。因此,该类不仅可用来实现有界记忆(Bounded Memory),同时可用来实现无界记忆 (Unbounded Memory,即记忆槽数量可变)。

Expand All @@ -244,21 +244,130 @@ class ExternalMemory(object):
涉及三个主要函数:

```
memory_enhanced_seq2seq(...)
bidirectional_gru_encoder(...)
memory_enhanced_decoder(...)
def bidirectional_gru_encoder(input, size, word_vec_dim):
"""Bidirectional GRU encoder.
:params size: Hidden cell number in decoder rnn.
:type size: int
:params word_vec_dim: Word embedding size.
:type word_vec_dim: int
:return: Tuple of 1. concatenated forward and backward hidden sequence.
2. last state of backward rnn.
:rtype: tuple of LayerOutput
"""
pass
def memory_enhanced_decoder(input, target, initial_state, source_context, size,
word_vec_dim, dict_size, is_generating, beam_size):
"""GRU sequence decoder enhanced with external memory.
The "external memory" refers to two types of memories.
- Unbounded memory: i.e. attention mechanism in Seq2Seq.
- Bounded memory: i.e. external memory in NTM.
Both types of external memories can be implemented with
ExternalMemory class, and are both exploited in this enhanced RNN decoder.
The vanilla RNN/LSTM/GRU also has a narrow memory mechanism, namely the
hidden state vector (or cell state in LSTM) carrying information through
a span of sequence time, which is a successful design enriching the model
with the capability to "remember" things in the long run. However, such a
vector state is somewhat limited to a very narrow memory bandwidth. External
memory introduced here could easily increase the memory capacity with linear
complexity cost (rather than quadratic for vector state).
This enhanced decoder expands its "memory passage" through two
ExternalMemory objects:
- Bounded memory for handling long-term information exchange within decoder
itself. A direct expansion of traditional "vector" state.
- Unbounded memory for handling source language's token-wise information.
Exactly the attention mechanism over Seq2Seq.
Notice that we take the attention mechanism as a particular form of external
memory, with read-only memory bank initialized with encoder states, and a
read head with content-based addressing (attention). From this view point,
we arrive at a better understanding of attention mechanism itself and other
external memory, and a concise and unified implementation for them.
For more details about external memory, please refer to
`Neural Turing Machines <https://arxiv.org/abs/1410.5401>`_.
For more details about this memory-enhanced decoder, please
refer to `Memory-enhanced Decoder for Neural Machine Translation
<https://arxiv.org/abs/1606.02003>`_. This implementation is highly
correlated to this paper, but with minor differences (e.g. put "write"
before "read" to bypass a potential bug in V2 APIs. See
(`issue <https://github.com/PaddlePaddle/Paddle/issues/2061>`_).
"""
pass
def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target,
hidden_size, word_vec_dim, dict_size, is_generating,
beam_size):
"""Seq2Seq Model enhanced with external memory.
The "external memory" refers to two types of memories.
- Unbounded memory: i.e. attention mechanism in Seq2Seq.
- Bounded memory: i.e. external memory in NTM.
Both types of external memories can be implemented with
ExternalMemory class, and are both exploited in this Seq2Seq model.
:params encoder_input: Encoder input.
:type encoder_input: LayerOutput
:params decoder_input: Decoder input.
:type decoder_input: LayerOutput
:params decoder_target: Decoder target.
:type decoder_target: LayerOutput
:params hidden_size: Hidden cell number, both in encoder and decoder rnn.
:type hidden_size: int
:params word_vec_dim: Word embedding size.
:type word_vec_dim: int
:param dict_size: Vocabulary size.
:type dict_size: int
:params is_generating: Whether for beam search inferencing (True) or
for training (False).
:type is_generating: bool
:params beam_size: Beam search width.
:type beam_size: int
:return: Cost layer if is_generating=False; Beam search layer if
is_generating = True.
:rtype: LayerOutput
"""
pass
```

`memory_enhanced_seq2seq` 函数定义整个带外部记忆机制的序列到序列模型,是模型定义的主调函数。它首先调用`bidirectional_gru_encoder` 对源语言进行编码,然后通过 `memory_enhanced_decoder` 进行解码。
- `bidirectional_gru_encoder` 函数实现双向单层 GRU(Gated Recurrent Unit) 编码器。返回两组结果:一组为字符级编码向量序列(包含前后向),一组为整个源语句的句级编码向量(仅后向)。前者用于解码器的注意力机制中记忆矩阵的初始化,后者用于解码器的状态向量的初始化。

- `memory_enhanced_decoder` 函数实现通过外部记忆增强的 GRU 解码器。它利用同一个`ExternalMemory` 类实现两种外部记忆模块:

- 无界外部记忆:即传统的注意力机制。利用`ExternalMemory`,打开只读开关,关闭插值寻址。并利用解码器的第一组输出作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是动态可变的,取决于编码器的字符数。

```
unbounded_memory = ExternalMemory(
name="unbounded_memory",
mem_slot_size=size * 2,
boot_layer=unbounded_memory_init,
readonly=True,
enable_interpolation=False)
```
- 有界外部记忆:利用`ExternalMemory`,关闭只读开关,打开插值寻址。并利用解码器的第一组输出,取均值池化(pooling)后并扩展为指定序列长度后,叠加随机噪声(训练和推断时保持一致),作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是固定的。即代码中的:

`bidirectional_gru_encoder` 函数实现双向单层 GRU(Gated Recurrent Unit) 编码器。返回两组结果:一组为字符级编码向量序列(包含前后向),一组为整个源语句的句级编码向量(仅后向)。前者用于解码器的注意力机制中记忆矩阵的初始化,后者用于解码器的状态向量的初始化。
```
bounded_memory = ExternalMemory(
name="bounded_memory",
mem_slot_size=size,
boot_layer=bounded_memory_init,
readonly=False,
enable_interpolation=True)
```

`memory_enhanced_decoder` 函数实现通过外部记忆增强的 GRU 解码器。它利用同一个`ExternalMemory` 类实现两种外部记忆模块:
注意到,在我们的实现中,注意力机制(或无界外部存储)和神经图灵机(或有界外部存储)被实现成相同的 `ExternalMemory` 类。前者是**只读**的, 后者**可读可写**。这样处理仅仅是为了便于统一我们对 “注意机制” 和 “记忆机制” 的理解和认识,同时也提供更简洁和统一的实现版本。注意力机制也可以通过 `paddle.networks.simple_attention` 实现。

- `memory_enhanced_seq2seq` 函数定义整个带外部记忆机制的序列到序列模型,是模型定义的主调函数。它首先调用`bidirectional_gru_encoder` 对源语言进行编码,然后通过 `memory_enhanced_decoder` 进行解码。

- 无界外部记忆:即传统的注意力机制。利用`ExternalMemory`,打开只读开关,关闭插值寻址。并利用解码器的第一组输出作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是动态可变的,取决于编码器的字符数。
- 有界外部记忆:利用`ExternalMemory`,关闭只读开关,打开插值寻址。并利用解码器的第一组输出,取均值池化(pooling)后并扩展为指定序列长度后,叠加随机噪声(训练和推断时保持一致),作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是固定的。

注意到,在我们的实现中,注意力机制(或无界外部存储)和神经图灵机(或有界外部存储)被实现成相同的 `ExternalMemory` 类。前者是**只读**的, 后者**可读可写**。这样处理仅仅是为了便于统一我们对 “注意机制” 和 “记忆机制” 的理解和认识,同时也提供更简洁和统一的实现版本。注意力机制也可以通过 `paddle.networks.simple_attention` 实现。

此外,在该实现中,将 `ExternalMemory``write` 操作提前至 `read` 之前,以避开潜在的拓扑连接局限,详见 [Issue](https://github.com/PaddlePaddle/Paddle/issues/2061)。我们可以看到,本质上他们是等价的。

Expand All @@ -278,24 +387,73 @@ def reader():

用户需自行完成字符的切分 (Tokenize) ,并构建字典完成 ID 化。

PaddlePaddle 的接口 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py), 默认提供了一个经过预处理的、较小规模的 wmt14 英法翻译数据集的子集。并提供了两个reader creator函数如下:
PaddlePaddle 的接口 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py), 默认提供了一个经过预处理的、较小规模的 [wmt14 英法翻译数据集的子集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)(该数据集有193319条训练数据,6003条测试数据,词典长度为30000)。并提供了两个reader creator函数如下:

```
paddle.dataset.wmt14.train(dict_size)
paddle.dataset.wmt14.test(dict_size)
```

这两个函数被调用时即返回相应的`reader()`函数,供`paddle.traner.SGD.train`使用。
这两个函数被调用时即返回相应的`reader()`函数,供`paddle.traner.SGD.train`使用。当我们需要使用其他数据时,可参考 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py) 构造相应的 data creator,并替换 `paddle.dataset.wmt14.train``paddle.dataset.wmt14.train` 成相应函数名。

### 训练

命令行输入:

当我们需要使用其他数据时,可参考 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py) 构造相应的 data creator,并替换 `paddle.dataset.wmt14.train``paddle.dataset.wmt14.train` 成相应函数名。
```
python mt_with_external_memory.py
```
或自定义部分参数, 例如:

### 训练及预测
```
CUDA_VISIBLE_DEVICES=8,9,10,11 python train.py \
--dict_size 30000 \
--word_vec_dim 512 \
--hidden_size 1024 \
--memory_slot_num 8 \
--use_gpu True \
--trainer_count 4 \
--num_passes 100 \
--batch_size 128 \
--memory_perturb_stddev 0.1
```

即可运行训练脚本,训练模型将被定期保存于本地 `./checkpoints`。参数含义可运行

```
python train.py --help
```


### 解码

命令行输入:

```python mt_with_external_memory.py```
```
python infer.py
```
或自定义部分参数, 例如:

```
CUDA_VISIBLE_DEVICES=8,9,10,11 python train.py \
--dict_size 30000 \
--word_vec_dim 512 \
--hidden_size 1024 \
--memory_slot_num 8 \
--use_gpu True \
--trainer_count 4 \
--memory_perturb_stddev 0.1 \
--infer_num_data 10 \
--model_filepath checkpoints/params.latest.tar.gz
--beam_size 3
```

即可运行解码脚本,产生示例翻译结果。参数含义可运行:

```
python infer.py --help
```

即可运行训练脚本(默认训练一轮),训练模型将被定期保存于本地 `params.tar.gz`。训练完成后,将为少量样本生成翻译结果,详见 `infer` 函数。

## 其他讨论

Expand Down
18 changes: 6 additions & 12 deletions mt_with_external_memory/external_memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@


class ExternalMemory(object):
"""
External neural memory class.
"""External neural memory class.
A simplified Neural Turing Machines (NTM) with only content-based
addressing (including content addressing and interpolation, but excluding
Expand Down Expand Up @@ -76,8 +75,7 @@ def __init__(self,
size=self.mem_slot_size)

def _content_addressing(self, key_vector):
"""
Get write/read head's addressing weights via content-based addressing.
"""Get write/read head's addressing weights via content-based addressing.
"""
# content-based addressing: a=tanh(W*M + U*key)
key_projection = paddle.layer.fc(
Expand All @@ -104,8 +102,7 @@ def _content_addressing(self, key_vector):
return addressing_weight

def _interpolation(self, head_name, key_vector, addressing_weight):
"""
Interpolate between previous and current addressing weights.
"""Interpolate between previous and current addressing weights.
"""
# prepare interpolation scalar gate: g=sigmoid(W*key)
gate = paddle.layer.fc(
Expand All @@ -126,8 +123,7 @@ def _interpolation(self, head_name, key_vector, addressing_weight):
return interpolated_weight

def _get_addressing_weight(self, head_name, key_vector):
"""
Get final addressing weights for read/write heads, including content
"""Get final addressing weights for read/write heads, including content
addressing and interpolation.
"""
# current content-based addressing
Expand All @@ -139,8 +135,7 @@ def _get_addressing_weight(self, head_name, key_vector):
return addressing_weight

def write(self, write_key):
"""
Write onto the external memory.
"""Write onto the external memory.
It cannot be called if "readonly" set True.
:param write_key: Key vector for write heads to generate writing
Expand Down Expand Up @@ -183,8 +178,7 @@ def write(self, write_key):
name=self.name)

def read(self, read_key):
"""
Read from the external memory.
"""Read from the external memory.
:param write_key: Key vector for read head to generate addressing
signals.
Expand Down
62 changes: 56 additions & 6 deletions mt_with_external_memory/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,15 @@


def bidirectional_gru_encoder(input, size, word_vec_dim):
"""
Bidirectional GRU encoder.
"""Bidirectional GRU encoder.
:params size: Hidden cell number in decoder rnn.
:type size: int
:params word_vec_dim: Word embedding size.
:type word_vec_dim: int
:return: Tuple of 1. concatenated forward and backward hidden sequence.
2. last state of backward rnn.
:rtype: tuple of LayerOutput
"""
# token embedding
embeddings = paddle.layer.embedding(input=input, size=word_vec_dim)
Expand All @@ -38,8 +45,7 @@ def bidirectional_gru_encoder(input, size, word_vec_dim):

def memory_enhanced_decoder(input, target, initial_state, source_context, size,
word_vec_dim, dict_size, is_generating, beam_size):
"""
GRU sequence decoder enhanced with external memory.
"""GRU sequence decoder enhanced with external memory.
The "external memory" refers to two types of memories.
- Unbounded memory: i.e. attention mechanism in Seq2Seq.
Expand Down Expand Up @@ -77,6 +83,30 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
correlated to this paper, but with minor differences (e.g. put "write"
before "read" to bypass a potential bug in V2 APIs. See
(`issue <https://github.com/PaddlePaddle/Paddle/issues/2061>`_).
:params input: Decoder input.
:type input: LayerOutput
:params target: Decoder target.
:type target: LayerOutput
:params initial_state: Initial hidden state.
:type initial_state: LayerOutput
:params source_context: Group of context hidden states for each token in the
source sentence, for attention mechanisim.
:type source_context: LayerOutput
:params size: Hidden cell number in decoder rnn.
:type size: int
:params word_vec_dim: Word embedding size.
:type word_vec_dim: int
:param dict_size: Vocabulary size.
:type dict_size: int
:params is_generating: Whether for beam search inferencing (True) or
for training (False).
:type is_generating: bool
:params beam_size: Beam search width.
:type beam_size: int
:return: Cost layer if is_generating=False; Beam search layer if
is_generating = True.
:rtype: LayerOutput
"""
# prepare initial bounded and unbounded memory
bounded_memory_slot_init = paddle.layer.fc(
Expand Down Expand Up @@ -172,8 +202,7 @@ def recurrent_decoder_step(cur_embedding):
def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target,
hidden_size, word_vec_dim, dict_size, is_generating,
beam_size):
"""
Seq2Seq Model enhanced with external memory.
"""Seq2Seq Model enhanced with external memory.
The "external memory" refers to two types of memories.
- Unbounded memory: i.e. attention mechanism in Seq2Seq.
Expand All @@ -189,6 +218,27 @@ def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target,
For more details about this memory-enhanced Seq2Seq, please
refer to `Memory-enhanced Decoder for Neural Machine Translation
<https://arxiv.org/abs/1606.02003>`_.
:params encoder_input: Encoder input.
:type encoder_input: LayerOutput
:params decoder_input: Decoder input.
:type decoder_input: LayerOutput
:params decoder_target: Decoder target.
:type decoder_target: LayerOutput
:params hidden_size: Hidden cell number, both in encoder and decoder rnn.
:type hidden_size: int
:params word_vec_dim: Word embedding size.
:type word_vec_dim: int
:param dict_size: Vocabulary size.
:type dict_size: int
:params is_generating: Whether for beam search inferencing (True) or
for training (False).
:type is_generating: bool
:params beam_size: Beam search width.
:type beam_size: int
:return: Cost layer if is_generating=False; Beam search layer if
is_generating = True.
:rtype: LayerOutput
"""
# encoder
context_encodings, sequence_encoding = bidirectional_gru_encoder(
Expand Down

0 comments on commit 69b4bda

Please sign in to comment.