mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
41 lines
1.3 KiB
Plaintext
41 lines
1.3 KiB
Plaintext
# Input Sequences
|
|
|
|
<tokenizerslangcontent>
|
|
<python>
|
|
These types represent all the different kinds of sequence that can be used as input of a Tokenizer.
|
|
Globally, any sequence can be either a string or a list of strings, according to the operating
|
|
mode of the tokenizer: `raw text` vs `pre-tokenized`.
|
|
|
|
## TextInputSequence[[tokenizers.TextInputSequence]]
|
|
|
|
<code>tokenizers.TextInputSequence</code>
|
|
|
|
A `str` that represents an input sequence
|
|
|
|
## PreTokenizedInputSequence[[tokenizers.PreTokenizedInputSequence]]
|
|
|
|
<code>tokenizers.PreTokenizedInputSequence</code>
|
|
|
|
A pre-tokenized input sequence. Can be one of:
|
|
- A `List` of `str`
|
|
- A `Tuple` of `str`
|
|
|
|
alias of `Union[List[str], Tuple[str]]`.
|
|
|
|
## InputSequence[[tokenizers.InputSequence]]
|
|
|
|
<code>tokenizers.InputSequence</code>
|
|
|
|
Represents all the possible types of input sequences for encoding. Can be:
|
|
- When `is_pretokenized=False`: [TextInputSequence](#tokenizers.TextInputSequence)
|
|
- When `is_pretokenized=True`: [PreTokenizedInputSequence](#tokenizers.PreTokenizedInputSequence)
|
|
|
|
alias of `Union[str, List[str], Tuple[str]]`.
|
|
</python>
|
|
<rust>
|
|
The Rust API Reference is available directly on the [Docs.rs](https://docs.rs/tokenizers/latest/tokenizers/) website.
|
|
</rust>
|
|
<node>
|
|
The node API has not been documented yet.
|
|
</node>
|
|
</tokenizerslangcontent> |