mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-03 19:28:20 +00:00
Fix one char super tiny typo (#1137)
* Update pipeline.mdx * Update pipeline.rst
This commit is contained in:
@@ -558,7 +558,7 @@ If you used a model that added special characters to represent subtokens
|
|||||||
of a given "word" (like the `"##"` in
|
of a given "word" (like the `"##"` in
|
||||||
WordPiece) you will need to customize the `decoder` to treat
|
WordPiece) you will need to customize the `decoder` to treat
|
||||||
them properly. If we take our previous `bert_tokenizer` for instance the
|
them properly. If we take our previous `bert_tokenizer` for instance the
|
||||||
default decoing will give:
|
default decoding will give:
|
||||||
|
|
||||||
<tokenizerslangcontent>
|
<tokenizerslangcontent>
|
||||||
<python>
|
<python>
|
||||||
|
|||||||
@@ -497,7 +497,7 @@ remove all special tokens, then join those tokens with spaces:
|
|||||||
|
|
||||||
If you used a model that added special characters to represent subtokens of a given "word" (like
|
If you used a model that added special characters to represent subtokens of a given "word" (like
|
||||||
the :obj:`"##"` in WordPiece) you will need to customize the `decoder` to treat them properly. If we
|
the :obj:`"##"` in WordPiece) you will need to customize the `decoder` to treat them properly. If we
|
||||||
take our previous :entity:`bert_tokenizer` for instance the default decoing will give:
|
take our previous :entity:`bert_tokenizer` for instance the default decoding will give:
|
||||||
|
|
||||||
.. only:: python
|
.. only:: python
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user