mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Fix readme indentation
This commit is contained in:
18
README.md
18
README.md
@ -8,15 +8,15 @@ vocabulary, and then process some text either in real time or in advance.
|
|||||||
|
|
||||||
A Tokenizer works as a pipeline taking some raw text as input, going through multiple steps to
|
A Tokenizer works as a pipeline taking some raw text as input, going through multiple steps to
|
||||||
finally output a list of `Token`s. The various steps of the pipeline are:
|
finally output a list of `Token`s. The various steps of the pipeline are:
|
||||||
- Some optional `Normalizer`s. An example would be a Unicode normalization step. They take
|
- Some optional `Normalizer`s. An example would be a Unicode normalization step. They take
|
||||||
some raw text as input, and also output raw text `String`.
|
some raw text as input, and also output raw text `String`.
|
||||||
- An optional `PreTokenizer` which should take some raw text and take care of spliting
|
- An optional `PreTokenizer` which should take some raw text and take care of spliting
|
||||||
as relevant, and pre-processing tokens if needed. Takes a raw text `String` as input, and
|
as relevant, and pre-processing tokens if needed. Takes a raw text `String` as input, and
|
||||||
outputs a `Vec<String>`.
|
outputs a `Vec<String>`.
|
||||||
- A `Model` to do the actual tokenization. An example of `Model` would be `BPE`. Takes
|
- A `Model` to do the actual tokenization. An example of `Model` would be `BPE`. Takes
|
||||||
a `Vec<String>` as input, and gives a `Vec<Token>`.
|
a `Vec<String>` as input, and gives a `Vec<Token>`.
|
||||||
- Some optional `PostProcessor`s. These are in charge of post processing the list of `Token`s
|
- Some optional `PostProcessor`s. These are in charge of post processing the list of `Token`s
|
||||||
in any relevant way. This includes truncating, adding some padding, ...
|
in any relevant way. This includes truncating, adding some padding, ...
|
||||||
|
|
||||||
## Try the shell
|
## Try the shell
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user