mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Doc - Fix some typos
Co-Authored-By: Taufiquzzaman Peyash <taufiquzzaman.peyash@northsouth.edu>
This commit is contained in:
@ -24,7 +24,7 @@ copyright = "2020, huggingface"
|
|||||||
author = "huggingface"
|
author = "huggingface"
|
||||||
|
|
||||||
# The full version, including alpha/beta/rc tags
|
# The full version, including alpha/beta/rc tags
|
||||||
release = "0.9.0"
|
release = ""
|
||||||
|
|
||||||
# -- Custom information ------------------------------------------------------
|
# -- Custom information ------------------------------------------------------
|
||||||
|
|
||||||
|
@ -435,7 +435,7 @@ Post-processing
|
|||||||
|
|
||||||
We might want our tokenizer to automatically add special tokens, like :obj:`"[CLS]"` or
|
We might want our tokenizer to automatically add special tokens, like :obj:`"[CLS]"` or
|
||||||
:obj:`"[SEP]"`. To do this, we use a post-processor. :entity:`TemplateProcessing` is the
|
:obj:`"[SEP]"`. To do this, we use a post-processor. :entity:`TemplateProcessing` is the
|
||||||
most commonly used, you just have so specify a template for the processing of single sentences and
|
most commonly used, you just have to specify a template for the processing of single sentences and
|
||||||
pairs of sentences, along with the special tokens and their IDs.
|
pairs of sentences, along with the special tokens and their IDs.
|
||||||
|
|
||||||
When we built our tokenizer, we set :obj:`"[CLS]"` and :obj:`"[SEP]"` in positions 1 and 2 of our
|
When we built our tokenizer, we set :obj:`"[CLS]"` and :obj:`"[SEP]"` in positions 1 and 2 of our
|
||||||
@ -741,7 +741,7 @@ In this case, the `attention mask` generated by the tokenizer takes the padding
|
|||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
from tokenizers import ByteLevelBPETokenizer
|
from tokenizers import BertWordPieceTokenizer
|
||||||
|
|
||||||
tokenizer = BertWordPieceTokenizer("bert-base-uncased-vocab.txt", lowercase=True)
|
tokenizer = BertWordPieceTokenizer("bert-base-uncased-vocab.txt", lowercase=True)
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user