mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Fix broken links in docs (#1133)
This commit is contained in:
@ -300,7 +300,7 @@ customize this part. Currently, the 🤗 Tokenizers library supports:
|
||||
- `models.WordPiece`
|
||||
|
||||
For more details about each model and its behavior, you can check
|
||||
[here](components.html#models)
|
||||
[here](components#models)
|
||||
|
||||
## Post-Processing
|
||||
|
||||
|
@ -287,7 +287,7 @@ with the `Tokenizer.encode` method:
|
||||
|
||||
This applied the full pipeline of the tokenizer on the text, returning
|
||||
an `Encoding` object. To learn more
|
||||
about this pipeline, and how to apply (or customize) parts of it, check out `this page <pipeline>`.
|
||||
about this pipeline, and how to apply (or customize) parts of it, check out [this page](pipeline).
|
||||
|
||||
This `Encoding` object then has all the
|
||||
attributes you need for your deep learning model (or other). The
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Training from memory
|
||||
|
||||
In the [Quicktour](quicktour.html), we saw how to build and train a
|
||||
In the [Quicktour](quicktour), we saw how to build and train a
|
||||
tokenizer using text files, but we can actually use any Python Iterator.
|
||||
In this section we'll see a few different ways of training our
|
||||
tokenizer.
|
||||
@ -22,7 +22,7 @@ takes care of normalizing the input using the NFKC Unicode normalization
|
||||
method, and uses a [`~tokenizers.pre_tokenizers.ByteLevel`] pre-tokenizer with the corresponding decoder.
|
||||
|
||||
For more information on the components used here, you can check
|
||||
[here](components.html)
|
||||
[here](components).
|
||||
|
||||
## The most basic way
|
||||
|
||||
|
@ -261,7 +261,7 @@ how to customize this part. Currently, the 🤗 Tokenizers library supports:
|
||||
- :entity:`models.WordLevel`
|
||||
- :entity:`models.WordPiece`
|
||||
|
||||
For more details about each model and its behavior, you can check `here <components.html#models>`__
|
||||
For more details about each model and its behavior, you can check `here <components#models>`__
|
||||
|
||||
|
||||
.. _post-processing:
|
||||
|
@ -1,7 +1,7 @@
|
||||
Training from memory
|
||||
----------------------------------------------------------------------------------------------------
|
||||
|
||||
In the `Quicktour <quicktour.html>`__, we saw how to build and train a tokenizer using text files,
|
||||
In the `Quicktour <quicktour>`__, we saw how to build and train a tokenizer using text files,
|
||||
but we can actually use any Python Iterator. In this section we'll see a few different ways of
|
||||
training our tokenizer.
|
||||
|
||||
@ -18,7 +18,7 @@ This tokenizer is based on the :class:`~tokenizers.models.Unigram` model. It tak
|
||||
normalizing the input using the NFKC Unicode normalization method, and uses a
|
||||
:class:`~tokenizers.pre_tokenizers.ByteLevel` pre-tokenizer with the corresponding decoder.
|
||||
|
||||
For more information on the components used here, you can check `here <components.html>`__
|
||||
For more information on the components used here, you can check `here <components>`__
|
||||
|
||||
The most basic way
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
Reference in New Issue
Block a user