Update CHANGELOGs

This commit is contained in:
Anthony MOI
2020-03-05 17:32:43 -05:00
parent d778ed5e0a
commit 86d2e90ad2
2 changed files with 5 additions and 0 deletions

View File

@@ -4,11 +4,14 @@
- Keep only one progress bar while reading files during training. This is better for use-cases with
a high number of files as it avoids having too many progress bar on screen.
- `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer`
- Added the `ByteLevel` `PostProcessor` to take care of fixing the offsets when a unicode character
gets split up as multiple byte-level characters.
## How to migrate:
- Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`.
The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets
being wrong if this option was on.
- Add the `ByteLevel` `PostProcessor` to your byte-level BPE tokenizers.
# v0.6.0