mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Doc - Updated API Reference for encode/encode_batch
This commit is contained in:
@ -1,2 +1,32 @@
|
||||
Input sequences
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These types represent all the different kinds of sequence that can be used as input of a Tokenizer.
|
||||
Globally, any sequence can be either a string or a list of strings, according to the operating
|
||||
mode of the tokenizer: ``raw text`` vs ``pre-tokenized``.
|
||||
|
||||
.. autodata:: tokenizers.TextInputSequence
|
||||
|
||||
.. autodata:: tokenizers.PreTokenizedInputSequence
|
||||
|
||||
.. autodata:: tokenizers.InputSequence
|
||||
|
||||
|
||||
Encode inputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These types represent all the different kinds of input that a :class:`~tokenizers.Tokenizer` accepts
|
||||
when using :meth:`~tokenizers.Tokenizer.encode_batch`.
|
||||
|
||||
.. autodata:: tokenizers.TextEncodeInput
|
||||
|
||||
.. autodata:: tokenizers.PreTokenizedEncodeInput
|
||||
|
||||
.. autodata:: tokenizers.EncodeInput
|
||||
|
||||
|
||||
Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: tokenizers.Tokenizer
|
||||
:members:
|
||||
|
Reference in New Issue
Block a user