mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
48 lines
2.1 KiB
Plaintext
48 lines
2.1 KiB
Plaintext
# Encode Inputs
|
|
|
|
<tokenizerslangcontent>
|
|
<python>
|
|
These types represent all the different kinds of input that a [`~tokenizers.Tokenizer`] accepts
|
|
when using [`~tokenizers.Tokenizer.encode_batch`].
|
|
|
|
## TextEncodeInput[[[[tokenizers.TextEncodeInput]]]]
|
|
|
|
<code>tokenizers.TextEncodeInput</code>
|
|
|
|
Represents a textual input for encoding. Can be either:
|
|
- A single sequence: [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence)
|
|
- A pair of sequences:
|
|
- A Tuple of [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence)
|
|
- Or a List of [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence) of size 2
|
|
|
|
alias of `Union[str, Tuple[str, str], List[str]]`.
|
|
|
|
## PreTokenizedEncodeInput[[[[tokenizers.PreTokenizedEncodeInput]]]]
|
|
|
|
<code>tokenizers.PreTokenizedEncodeInput</code>
|
|
|
|
Represents a pre-tokenized input for encoding. Can be either:
|
|
- A single sequence: [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence)
|
|
- A pair of sequences:
|
|
- A Tuple of [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence)
|
|
- Or a List of [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence) of size 2
|
|
|
|
alias of `Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]`.
|
|
|
|
## EncodeInput[[[[tokenizers.EncodeInput]]]]
|
|
|
|
<code>tokenizers.EncodeInput</code>
|
|
|
|
Represents all the possible types of input for encoding. Can be:
|
|
- When `is_pretokenized=False`: [TextEncodeInput](#tokenizers.TextEncodeInput)
|
|
- When `is_pretokenized=True`: [PreTokenizedEncodeInput](#tokenizers.PreTokenizedEncodeInput)
|
|
|
|
alias of `Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]`.
|
|
</python>
|
|
<rust>
|
|
The Rust API Reference is available directly on the [Docs.rs](https://docs.rs/tokenizers/latest/tokenizers/) website.
|
|
</rust>
|
|
<node>
|
|
The node API has not been documented yet.
|
|
</node>
|
|
</tokenizerslangcontent> |