Doc - Update components page

2025-08-22 16:25:30 +00:00 · 2020-10-20 12:11:50 -04:00
parent 9dc0d73348
commit cdeb2c9ddb
1 changed files with 66 additions and 5 deletions
--- a/docs/source/components.rst
+++ b/docs/source/components.rst
@ -6,6 +6,52 @@ to customize its behavior. This page lists most provided components.
 .. _normalizers:
 .. entities:: python
    BertNormalizer.clean_text
        clean_text
    BertNormalizer.handle_chinese_chars
        handle_chinese_chars
    BertNormalizer.strip_accents
        strip_accents
    BertNormalizer.lowercase
        lowercase
    Normalizer.Sequence
        ``Sequence([NFKC(), Lowercase()])``
    PreTokenizer.Sequence
        ``Sequence([Punctuation(), WhitespaceSplit()])``
 .. entities:: rust
    BertNormalizer.clean_text
        clean_text
    BertNormalizer.handle_chinese_chars
        handle_chinese_chars
    BertNormalizer.strip_accents
        strip_accents
    BertNormalizer.lowercase
        lowercase
    Normalizer.Sequence
        ``Sequence::new(vec![NFKC, Lowercase])``
    PreTokenizer.Sequence
        ``Sequence::new(vec![Punctuation, WhitespaceSplit])``
 .. entities:: node
    BertNormalizer.clean_text
        cleanText
    BertNormalizer.handle_chinese_chars
        handleChineseChars
    BertNormalizer.strip_accents
        stripAccents
    BertNormalizer.lowercase
        lowercase
    Normalizer.Sequence
        ..
    PreTokenizer.Sequence
        ..
 Normalizers
 ----------------------------------------------------------------------------------------------------
@ -65,11 +111,20 @@ The ``Normalizer`` is optional.
       Input: ``"banana"``
       Ouput: ``"benene"``
   * - BertNormalizer
     - Provides an implementation of the Normalizer used in the original BERT. Options
       that can be set are:
            - :entity:`BertNormalizer.clean_text`
            - :entity:`BertNormalizer.handle_chinese_chars`
            - :entity:`BertNormalizer.strip_accents`
            - :entity:`BertNormalizer.lowercase`
     -
   * - Sequence
     - Composes multiple normalizers that will run in the provided order
-     - Example::
+     - :entity:`Normalizer.Sequence`
           Sequence([Nmt(), NFKC()])
 .. _pre-tokenizers:
@ -142,9 +197,15 @@ the ByteLevel)
       Ouput: ``"Hello", "there"``
   * - Digits
     - Splits the numbers from any other characters.
     - Input: ``"Hello123there"``
       Output: ```"Hello", "123", "there"```
   * - Sequence
     - Lets you compose multiple ``PreTokenizer`` that will be run in the given order
-     - ``Sequence([Punctuation(), WhitespaceSplit()])``
+     - :entity:`PreTokenizer.Sequence`
 .. _models:
@ -214,7 +275,7 @@ is the component doing just that.
   * - TemplateProcessing
     - Let's you easily template the post processing, adding special tokens, and specifying
       the ``type_id`` for each sequence/special token. The template is given two strings
-       representing the single sequence and the pair of sequences, as well as a set of 
+       representing the single sequence and the pair of sequences, as well as a set of
       special tokens to use.
     - Example, when specifying a template with these values: