mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-07 05:08:24 +00:00
Update CHANGELOGs
This commit is contained in:
@@ -3,6 +3,12 @@
|
||||
## Changes:
|
||||
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
||||
a high number of files as it avoids having too many progress bar on screen.
|
||||
- `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer`
|
||||
|
||||
## How to migrate:
|
||||
- Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`.
|
||||
The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets
|
||||
being wrong if this option was on.
|
||||
|
||||
# v0.6.0
|
||||
|
||||
|
||||
@@ -4,6 +4,11 @@
|
||||
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
||||
a high number of files as it avoids having too many progress bar on screen.
|
||||
- Improve BPE and WordPiece builders.
|
||||
- `ByteLevel` is also a `Normalizer` and handles the `add_prefix_space` option at this level now.
|
||||
This fixes some issues with the offsets being wrong if this option was on.
|
||||
|
||||
## How to migrate:
|
||||
- Use the `ByteLevel` as a `Normalizer` if `add_prefix_space` is required.
|
||||
|
||||
# v0.8.0
|
||||
|
||||
|
||||
Reference in New Issue
Block a user