Update pipeline.mdx

Fix conversion errors
This commit is contained in:
Mishig Davaadorj
2022-04-25 21:03:31 +02:00
committed by GitHub
parent 0bd4976dba
commit 00132ba836

View File

@ -520,7 +520,7 @@ On top of encoding the input texts, a `Tokenizer` also has an API for decoding,
generated by your model back to a text. This is done by the methods generated by your model back to a text. This is done by the methods
`Tokenizer.decode` (for one predicted text) and `Tokenizer.decode_batch` (for a batch of predictions). `Tokenizer.decode` (for one predicted text) and `Tokenizer.decode_batch` (for a batch of predictions).
The [decoder]{.title-ref} will first convert the IDs back to tokens The `decoder` will first convert the IDs back to tokens
(using the tokenizer's vocabulary) and remove all special tokens, then (using the tokenizer's vocabulary) and remove all special tokens, then
join those tokens with spaces: join those tokens with spaces:
@ -556,7 +556,7 @@ join those tokens with spaces:
If you used a model that added special characters to represent subtokens If you used a model that added special characters to represent subtokens
of a given "word" (like the `"##"` in of a given "word" (like the `"##"` in
WordPiece) you will need to customize the [decoder]{.title-ref} to treat WordPiece) you will need to customize the `decoder` to treat
them properly. If we take our previous `bert_tokenizer` for instance the them properly. If we take our previous `bert_tokenizer` for instance the
default decoing will give: default decoing will give: