mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Fix broken links in docs (#1133)
This commit is contained in:
@ -300,7 +300,7 @@ customize this part. Currently, the 🤗 Tokenizers library supports:
|
|||||||
- `models.WordPiece`
|
- `models.WordPiece`
|
||||||
|
|
||||||
For more details about each model and its behavior, you can check
|
For more details about each model and its behavior, you can check
|
||||||
[here](components.html#models)
|
[here](components#models)
|
||||||
|
|
||||||
## Post-Processing
|
## Post-Processing
|
||||||
|
|
||||||
|
@ -287,7 +287,7 @@ with the `Tokenizer.encode` method:
|
|||||||
|
|
||||||
This applied the full pipeline of the tokenizer on the text, returning
|
This applied the full pipeline of the tokenizer on the text, returning
|
||||||
an `Encoding` object. To learn more
|
an `Encoding` object. To learn more
|
||||||
about this pipeline, and how to apply (or customize) parts of it, check out `this page <pipeline>`.
|
about this pipeline, and how to apply (or customize) parts of it, check out [this page](pipeline).
|
||||||
|
|
||||||
This `Encoding` object then has all the
|
This `Encoding` object then has all the
|
||||||
attributes you need for your deep learning model (or other). The
|
attributes you need for your deep learning model (or other). The
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Training from memory
|
# Training from memory
|
||||||
|
|
||||||
In the [Quicktour](quicktour.html), we saw how to build and train a
|
In the [Quicktour](quicktour), we saw how to build and train a
|
||||||
tokenizer using text files, but we can actually use any Python Iterator.
|
tokenizer using text files, but we can actually use any Python Iterator.
|
||||||
In this section we'll see a few different ways of training our
|
In this section we'll see a few different ways of training our
|
||||||
tokenizer.
|
tokenizer.
|
||||||
@ -22,7 +22,7 @@ takes care of normalizing the input using the NFKC Unicode normalization
|
|||||||
method, and uses a [`~tokenizers.pre_tokenizers.ByteLevel`] pre-tokenizer with the corresponding decoder.
|
method, and uses a [`~tokenizers.pre_tokenizers.ByteLevel`] pre-tokenizer with the corresponding decoder.
|
||||||
|
|
||||||
For more information on the components used here, you can check
|
For more information on the components used here, you can check
|
||||||
[here](components.html)
|
[here](components).
|
||||||
|
|
||||||
## The most basic way
|
## The most basic way
|
||||||
|
|
||||||
|
@ -261,7 +261,7 @@ how to customize this part. Currently, the 🤗 Tokenizers library supports:
|
|||||||
- :entity:`models.WordLevel`
|
- :entity:`models.WordLevel`
|
||||||
- :entity:`models.WordPiece`
|
- :entity:`models.WordPiece`
|
||||||
|
|
||||||
For more details about each model and its behavior, you can check `here <components.html#models>`__
|
For more details about each model and its behavior, you can check `here <components#models>`__
|
||||||
|
|
||||||
|
|
||||||
.. _post-processing:
|
.. _post-processing:
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
Training from memory
|
Training from memory
|
||||||
----------------------------------------------------------------------------------------------------
|
----------------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
In the `Quicktour <quicktour.html>`__, we saw how to build and train a tokenizer using text files,
|
In the `Quicktour <quicktour>`__, we saw how to build and train a tokenizer using text files,
|
||||||
but we can actually use any Python Iterator. In this section we'll see a few different ways of
|
but we can actually use any Python Iterator. In this section we'll see a few different ways of
|
||||||
training our tokenizer.
|
training our tokenizer.
|
||||||
|
|
||||||
@ -18,7 +18,7 @@ This tokenizer is based on the :class:`~tokenizers.models.Unigram` model. It tak
|
|||||||
normalizing the input using the NFKC Unicode normalization method, and uses a
|
normalizing the input using the NFKC Unicode normalization method, and uses a
|
||||||
:class:`~tokenizers.pre_tokenizers.ByteLevel` pre-tokenizer with the corresponding decoder.
|
:class:`~tokenizers.pre_tokenizers.ByteLevel` pre-tokenizer with the corresponding decoder.
|
||||||
|
|
||||||
For more information on the components used here, you can check `here <components.html>`__
|
For more information on the components used here, you can check `here <components>`__
|
||||||
|
|
||||||
The most basic way
|
The most basic way
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Reference in New Issue
Block a user