Skip to content

Commit

Permalink
Merge pull request #989 from huggingface/mishig25-patch-2
Browse files Browse the repository at this point in the history
Update pipeline.mdx
  • Loading branch information
mishig25 authored Apr 25, 2022
2 parents 0bd4976 + 00132ba commit 6533bf0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/source-doc-builder/pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,7 @@ On top of encoding the input texts, a `Tokenizer` also has an API for decoding,
generated by your model back to a text. This is done by the methods
`Tokenizer.decode` (for one predicted text) and `Tokenizer.decode_batch` (for a batch of predictions).

The [decoder]{.title-ref} will first convert the IDs back to tokens
The `decoder` will first convert the IDs back to tokens
(using the tokenizer's vocabulary) and remove all special tokens, then
join those tokens with spaces:

Expand Down Expand Up @@ -556,7 +556,7 @@ join those tokens with spaces:

If you used a model that added special characters to represent subtokens
of a given "word" (like the `"##"` in
WordPiece) you will need to customize the [decoder]{.title-ref} to treat
WordPiece) you will need to customize the `decoder` to treat
them properly. If we take our previous `bert_tokenizer` for instance the
default decoing will give:

Expand Down

0 comments on commit 6533bf0

Please sign in to comment.