update Split pretokenizer docstrings (#1701)

This commit is contained in:
Dylan-Harden3
2025-01-08 05:35:52 -06:00
committed by GitHub
parent 166edd87c8
commit 6945933829
2 changed files with 4 additions and 4 deletions

View File

@ -422,10 +422,10 @@ class Split(PreTokenizer):
Args: Args:
pattern (:obj:`str` or :class:`~tokenizers.Regex`): pattern (:obj:`str` or :class:`~tokenizers.Regex`):
A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`. A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`.
If you want to use a regex pattern, it has to be wrapped around a `tokenizer.Regex`, If you want to use a regex pattern, it has to be wrapped around a `tokenizers.Regex`,
otherwise we consider is as a string pattern. For example `pattern="|"` otherwise we consider is as a string pattern. For example `pattern="|"`
means you want to split on `|` (imagine a csv file for example), while means you want to split on `|` (imagine a csv file for example), while
`patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'. `pattern=tokenizers.Regex("1|2")` means you split on either '1' or '2'.
behavior (:class:`~tokenizers.SplitDelimiterBehavior`): behavior (:class:`~tokenizers.SplitDelimiterBehavior`):
The behavior to use when splitting. The behavior to use when splitting.
Choices: "removed", "isolated", "merged_with_previous", "merged_with_next", Choices: "removed", "isolated", "merged_with_previous", "merged_with_next",

View File

@ -359,10 +359,10 @@ impl PyWhitespaceSplit {
/// Args: /// Args:
/// pattern (:obj:`str` or :class:`~tokenizers.Regex`): /// pattern (:obj:`str` or :class:`~tokenizers.Regex`):
/// A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`. /// A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`.
/// If you want to use a regex pattern, it has to be wrapped around a `tokenizer.Regex`, /// If you want to use a regex pattern, it has to be wrapped around a `tokenizers.Regex`,
/// otherwise we consider is as a string pattern. For example `pattern="|"` /// otherwise we consider is as a string pattern. For example `pattern="|"`
/// means you want to split on `|` (imagine a csv file for example), while /// means you want to split on `|` (imagine a csv file for example), while
/// `patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'. /// `pattern=tokenizers.Regex("1|2")` means you split on either '1' or '2'.
/// behavior (:class:`~tokenizers.SplitDelimiterBehavior`): /// behavior (:class:`~tokenizers.SplitDelimiterBehavior`):
/// The behavior to use when splitting. /// The behavior to use when splitting.
/// Choices: "removed", "isolated", "merged_with_previous", "merged_with_next", /// Choices: "removed", "isolated", "merged_with_previous", "merged_with_next",