Update docs for from_pretrained

2025-12-03 03:08:21 +00:00 · 2021-08-19 16:50:17 +02:00
parent 528c9a532e
commit a4d0f3dd18
2 changed files with 16 additions and 8 deletions
--- a/bindings/python/setup.py
+++ b/bindings/python/setup.py
@@ -3,6 +3,7 @@ from setuptools_rust import Binding, RustExtension

 extras = {}
 extras["testing"] = ["pytest", "requests", "numpy", "datasets"]
+extras["docs"] = ["sphinx", "sphinx_rtd_theme", "setuptools_rust"]

 setup(
    name="tokenizers",
--- a/docs/source/quicktour.rst
+++ b/docs/source/quicktour.rst
@@ -706,10 +706,22 @@ In this case, the `attention mask` generated by the tokenizer takes the padding
 .. only:: python

    Using a pretrained tokenizer
-    ----------------------------------------------------------------------------------------------------
+    ------------------------------------------------------------------------------------------------

-    You can also use a pretrained tokenizer directly in, as long as you have its vocabulary file. For
-    instance, here is how to get the classic pretrained BERT tokenizer:
+    You can load any tokenizer from the Hugging Face Hub as long as a `tokenizer.json` file is
+    available in the repository.
+
+    .. code-block:: python
+
+        from tokenizers import Tokenizer
+
+        tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
+
+    Importing a pretrained tokenizer from legacy vocabulary files
+    ------------------------------------------------------------------------------------------------
+
+    You can also import a pretrained tokenizer directly in, as long as you have its vocabulary file.
+    For instance, here is how to import the classic pretrained BERT tokenizer:

    .. code-block:: python

@@ -722,8 +734,3 @@ In this case, the `attention mask` generated by the tokenizer takes the padding
    .. code-block:: bash

        wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
-
-    .. note::
-
-        Better support for pretrained tokenizers is coming in a next release, so expect this API to
-        change soon.