diff --git a/docs/source-doc-builder/pipeline.mdx b/docs/source-doc-builder/pipeline.mdx index d40029a0..30f92bf1 100644 --- a/docs/source-doc-builder/pipeline.mdx +++ b/docs/source-doc-builder/pipeline.mdx @@ -300,7 +300,7 @@ customize this part. Currently, the 🤗 Tokenizers library supports: - `models.WordPiece` For more details about each model and its behavior, you can check -[here](components.html#models) +[here](components#models) ## Post-Processing diff --git a/docs/source-doc-builder/quicktour.mdx b/docs/source-doc-builder/quicktour.mdx index 85deb193..b7cd1400 100644 --- a/docs/source-doc-builder/quicktour.mdx +++ b/docs/source-doc-builder/quicktour.mdx @@ -287,7 +287,7 @@ with the `Tokenizer.encode` method: This applied the full pipeline of the tokenizer on the text, returning an `Encoding` object. To learn more -about this pipeline, and how to apply (or customize) parts of it, check out `this page `. +about this pipeline, and how to apply (or customize) parts of it, check out [this page](pipeline). This `Encoding` object then has all the attributes you need for your deep learning model (or other). The diff --git a/docs/source-doc-builder/training_from_memory.mdx b/docs/source-doc-builder/training_from_memory.mdx index aea2a6b9..121af3d9 100644 --- a/docs/source-doc-builder/training_from_memory.mdx +++ b/docs/source-doc-builder/training_from_memory.mdx @@ -1,6 +1,6 @@ # Training from memory -In the [Quicktour](quicktour.html), we saw how to build and train a +In the [Quicktour](quicktour), we saw how to build and train a tokenizer using text files, but we can actually use any Python Iterator. In this section we'll see a few different ways of training our tokenizer. @@ -22,7 +22,7 @@ takes care of normalizing the input using the NFKC Unicode normalization method, and uses a [`~tokenizers.pre_tokenizers.ByteLevel`] pre-tokenizer with the corresponding decoder. For more information on the components used here, you can check -[here](components.html) +[here](components). ## The most basic way diff --git a/docs/source/pipeline.rst b/docs/source/pipeline.rst index d843ee57..e5ab8fe9 100644 --- a/docs/source/pipeline.rst +++ b/docs/source/pipeline.rst @@ -261,7 +261,7 @@ how to customize this part. Currently, the 🤗 Tokenizers library supports: - :entity:`models.WordLevel` - :entity:`models.WordPiece` -For more details about each model and its behavior, you can check `here `__ +For more details about each model and its behavior, you can check `here `__ .. _post-processing: diff --git a/docs/source/tutorials/python/training_from_memory.rst b/docs/source/tutorials/python/training_from_memory.rst index ef15813e..3fe92495 100644 --- a/docs/source/tutorials/python/training_from_memory.rst +++ b/docs/source/tutorials/python/training_from_memory.rst @@ -1,7 +1,7 @@ Training from memory ---------------------------------------------------------------------------------------------------- -In the `Quicktour `__, we saw how to build and train a tokenizer using text files, +In the `Quicktour `__, we saw how to build and train a tokenizer using text files, but we can actually use any Python Iterator. In this section we'll see a few different ways of training our tokenizer. @@ -18,7 +18,7 @@ This tokenizer is based on the :class:`~tokenizers.models.Unigram` model. It tak normalizing the input using the NFKC Unicode normalization method, and uses a :class:`~tokenizers.pre_tokenizers.ByteLevel` pre-tokenizer with the corresponding decoder. -For more information on the components used here, you can check `here `__ +For more information on the components used here, you can check `here `__ The most basic way ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~