diff --git a/docs/source/index.rst b/docs/source/index.rst index c8aae922..9d61ccc4 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -27,7 +27,11 @@ Components: .. toctree:: :maxdepth: 2 + :caption: Getting Started + quicktour + installation + pipeline components Load an existing tokenizer: diff --git a/docs/source/installation.rst b/docs/source/installation.rst new file mode 100644 index 00000000..67b7e6a1 --- /dev/null +++ b/docs/source/installation.rst @@ -0,0 +1,5 @@ +Installation +==================================================================================================== + +- How to install using pip +- How to build from source diff --git a/docs/source/pipeline.rst b/docs/source/pipeline.rst new file mode 100644 index 00000000..55ff00e2 --- /dev/null +++ b/docs/source/pipeline.rst @@ -0,0 +1,10 @@ +The tokenization pipeline +==================================================================================================== + +TODO: Describe the tokenization pipeline: + +- Normalization +- Pre-tokenization +- Tokenization +- Post-processing +- Decoding diff --git a/docs/source/quicktour.rst b/docs/source/quicktour.rst new file mode 100644 index 00000000..8485737b --- /dev/null +++ b/docs/source/quicktour.rst @@ -0,0 +1,4 @@ +Quicktour +==================================================================================================== + +- How to use a tokenizer: encode, encode_batch, ``Encoding``, offsets, mappings, ...