Update CHANGELOGs

This commit is contained in:
Anthony MOI
2020-03-05 17:32:43 -05:00
parent d778ed5e0a
commit 86d2e90ad2
2 changed files with 5 additions and 0 deletions

View File

@@ -4,11 +4,14 @@
- Keep only one progress bar while reading files during training. This is better for use-cases with
a high number of files as it avoids having too many progress bar on screen.
- `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer`
- Added the `ByteLevel` `PostProcessor` to take care of fixing the offsets when a unicode character
gets split up as multiple byte-level characters.
## How to migrate:
- Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`.
The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets
being wrong if this option was on.
- Add the `ByteLevel` `PostProcessor` to your byte-level BPE tokenizers.
# v0.6.0

View File

@@ -6,6 +6,8 @@ a high number of files as it avoids having too many progress bar on screen.
- Improve BPE and WordPiece builders.
- `ByteLevel` is also a `Normalizer` and handles the `add_prefix_space` option at this level now.
This fixes some issues with the offsets being wrong if this option was on.
- `ByteLevel` is also a `PostProcessor` now and handles fixing the offsets when a unicode
character get split up in a byte-level character.
## How to migrate:
- Use the `ByteLevel` as a `Normalizer` if `add_prefix_space` is required.