Skip to content

Commit

Permalink
[docs] Fix bug (PaddlePaddle#133)
Browse files Browse the repository at this point in the history
* Modified the comments.

* Modified docs.
  • Loading branch information
xiemoyuan authored Mar 15, 2021
1 parent 0ceb9f5 commit fd928c3
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
11 changes: 5 additions & 6 deletions docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

`paddlenlp.data.Stack``paddlenlp.data.Pad``paddlenlp.data.Tuple``paddlenlp.data.Dict`用于构建生成mini-batch的`collate_fn`函数。

### 构建`dataset`
### 数据预处理

#### `paddlenlp.data.Vocab`

Expand Down Expand Up @@ -191,8 +191,7 @@ label: [[1], [0], [1]]

```python
from paddlenlp.data import Vocab, JiebaTokenizer, Stack, Pad, Tuple, SamplerHelper
from paddlenlp.datasets import ChnSentiCorp
from paddlenlp.datasets import MapDataset
from paddlenlp.datasets import load_dataset
from paddle.io import DataLoader

# 词表文件路径,运行示例程序可先下载词表文件
Expand All @@ -207,13 +206,13 @@ vocab = Vocab.load_vocabulary(
tokenizer = JiebaTokenizer(vocab)

def convert_example(example):
text, label = example
text, label = example['text'], example['label']
ids = tokenizer.encode(text)
label = [label]
return ids, label

dataset = ChnSentiCorp('train')
dataset = MapDataset(dataset).map(convert_example, lazy=True)
dataset = load_dataset('chnsenticorp', splits='train')
dataset = dataset.map(convert_example, lazy=True)

pad_id = vocab.token_to_idx[vocab.pad_token]
batchify_fn = Tuple(
Expand Down
2 changes: 1 addition & 1 deletion paddlenlp/transformers/unified_transformer/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
An UnifiedTransformer sequence has the following format:
::
- single sequence: ``[CLS] X [SEP]``
- pair of sequences: ``[CLS] A [SEP] [CLS] B [SEP]``
- pair of sequences: ``[CLS] A [SEP] B [SEP]``
Args:
token_ids_0 (list): List of IDs to which the special tokens will be
added.
Expand Down

0 comments on commit fd928c3

Please sign in to comment.