mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-07 13:18:31 +00:00
Update CHANGELOGs
This commit is contained in:
@@ -3,6 +3,12 @@
|
|||||||
## Changes:
|
## Changes:
|
||||||
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
||||||
a high number of files as it avoids having too many progress bar on screen.
|
a high number of files as it avoids having too many progress bar on screen.
|
||||||
|
- `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer`
|
||||||
|
|
||||||
|
## How to migrate:
|
||||||
|
- Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`.
|
||||||
|
The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets
|
||||||
|
being wrong if this option was on.
|
||||||
|
|
||||||
# v0.6.0
|
# v0.6.0
|
||||||
|
|
||||||
|
|||||||
@@ -4,6 +4,11 @@
|
|||||||
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
- Keep only one progress bar while reading files during training. This is better for use-cases with
|
||||||
a high number of files as it avoids having too many progress bar on screen.
|
a high number of files as it avoids having too many progress bar on screen.
|
||||||
- Improve BPE and WordPiece builders.
|
- Improve BPE and WordPiece builders.
|
||||||
|
- `ByteLevel` is also a `Normalizer` and handles the `add_prefix_space` option at this level now.
|
||||||
|
This fixes some issues with the offsets being wrong if this option was on.
|
||||||
|
|
||||||
|
## How to migrate:
|
||||||
|
- Use the `ByteLevel` as a `Normalizer` if `add_prefix_space` is required.
|
||||||
|
|
||||||
# v0.8.0
|
# v0.8.0
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user