mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-23 00:35:35 +00:00
Doc - Reorganize API Reference
This commit is contained in:
40
docs/source/api/python.inc
Normal file
40
docs/source/api/python.inc
Normal file
@ -0,0 +1,40 @@
|
||||
Input sequences
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These types represent all the different kinds of sequence that can be used as input of a Tokenizer.
|
||||
Globally, any sequence can be either a string or a list of strings, according to the operating
|
||||
mode of the tokenizer: ``raw text`` vs ``pre-tokenized``.
|
||||
|
||||
.. autodata:: tokenizers.TextInputSequence
|
||||
|
||||
.. autodata:: tokenizers.PreTokenizedInputSequence
|
||||
|
||||
.. autodata:: tokenizers.InputSequence
|
||||
|
||||
|
||||
Encode inputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These types represent all the different kinds of input that a :class:`~tokenizers.Tokenizer` accepts
|
||||
when using :meth:`~tokenizers.Tokenizer.encode_batch`.
|
||||
|
||||
.. autodata:: tokenizers.TextEncodeInput
|
||||
|
||||
.. autodata:: tokenizers.PreTokenizedEncodeInput
|
||||
|
||||
.. autodata:: tokenizers.EncodeInput
|
||||
|
||||
|
||||
Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: tokenizers.Tokenizer
|
||||
:members:
|
||||
:undoc-members:
|
||||
|
||||
|
||||
Added Tokens
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: tokenizers.AddedToken
|
||||
:members:
|
Reference in New Issue
Block a user