Doc - Update Model part of the Pipeline page

2025-08-22 16:25:30 +00:00 · 2020-10-28 13:35:49 -04:00
parent 620769fd4b
commit 5839348a46
2 changed files with 32 additions and 8 deletions
--- a/docs/source/entities.inc
+++ b/docs/source/entities.inc
@ -34,6 +34,14 @@
        :class:`~tokenizers.pre_tokenizers.Whitespace`
    PreTokenizer
        :class:`~tokenizers.pre_tokenizers.PreTokenizer`
    models.BPE
        :class:`~tokenizers.models.BPE`
    models.Unigram
        :class:`~tokenizers.models.Unigram`
    models.WordLevel
        :class:`~tokenizers.models.WordLevel`
    models.WordPiece
        :class:`~tokenizers.models.WordPiece`
 .. entities:: rust
@ -71,6 +79,14 @@
        :rust:struct:`~tokenizers::normalizers::whitespace::Whitespace`
    PreTokenizer
        :rust:trait:`~tokenizers::tokenizer::PreTokenizer`
    models.BPE
        :rust:struct:`~tokenizers::models::bpe::BPE`
    models.Unigram
        :rust:struct:`~tokenizers::models::unigram::Unigram`
    models.WordLevel
        :rust:struct:`~tokenizers::models::wordlevel::WordLevel`
    models.WordPiece
        :rust:struct:`~tokenizers::models::wordpiece::WordPiece`
 .. entities:: node
@ -108,3 +124,11 @@
        :obj:`Whitespace`
    PreTokenizer
        :obj:`PreTokenizer`
    models.BPE
        :obj:`BPE`
    models.Unigram
        :obj:`Unigram`
    models.WordLevel
        :obj:`WordLevel`
    models.WordPiece
        :obj:`WordPiece`
--- a/docs/source/pipeline.rst
+++ b/docs/source/pipeline.rst
@ -246,20 +246,20 @@ scratch afterward.
 The Model
 ----------------------------------------------------------------------------------------------------
-Once the input texts are normalized and pre-tokenized, we can apply the model on the pre-tokens.
+Once the input texts are normalized and pre-tokenized, the :entity:`Tokenizer` applies the model on
-This is the part of the pipeline that needs training on your corpus (or that has been trained if you
+the pre-tokens.  This is the part of the pipeline that needs training on your corpus (or that has
-are using a pretrained tokenizer).
+been trained if you are using a pretrained tokenizer).
 The role of the model is to split your "words" into tokens, using the rules it has learned. It's
 also responsible for mapping those tokens to their corresponding IDs in the vocabulary of the model.
-This model is passed along when intializing the :class:`~tokenizers.Tokenizer` so you already know
+This model is passed along when intializing the :entity:`Tokenizer` so you already know
 how to customize this part. Currently, the 🤗 Tokenizers library supports:
- :class:`~tokenizers.models.BPE`
+- :entity:`models.BPE`
- :class:`~tokenizers.models.Unigram`
+- :entity:`models.Unigram`
- :class:`~tokenizers.models.WordLevel`
+- :entity:`models.WordLevel`
- :class:`~tokenizers.models.WordPiece`
+- :entity:`models.WordPiece`
 For more details about each model and its behavior, you can check `here <components.html#models>`__