Doc - Updated API Reference for encode/encode_batch

This commit is contained in:
Anthony MOI
2020-10-06 17:02:21 -04:00
committed by Anthony MOI
parent f2f3ec51bd
commit 79f02bb7f0
4 changed files with 152 additions and 38 deletions

View File

@ -1,2 +1,32 @@
Input sequences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These types represent all the different kinds of sequence that can be used as input of a Tokenizer.
Globally, any sequence can be either a string or a list of strings, according to the operating
mode of the tokenizer: ``raw text`` vs ``pre-tokenized``.
.. autodata:: tokenizers.TextInputSequence
.. autodata:: tokenizers.PreTokenizedInputSequence
.. autodata:: tokenizers.InputSequence
Encode inputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These types represent all the different kinds of input that a :class:`~tokenizers.Tokenizer` accepts
when using :meth:`~tokenizers.Tokenizer.encode_batch`.
.. autodata:: tokenizers.TextEncodeInput
.. autodata:: tokenizers.PreTokenizedEncodeInput
.. autodata:: tokenizers.EncodeInput
Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: tokenizers.Tokenizer
:members: