Update CHANGELOGs

This commit is contained in:
Anthony MOI
2020-03-05 17:32:43 -05:00
parent d778ed5e0a
commit 86d2e90ad2
2 changed files with 5 additions and 0 deletions

View File

@@ -4,11 +4,14 @@
- Keep only one progress bar while reading files during training. This is better for use-cases with - Keep only one progress bar while reading files during training. This is better for use-cases with
a high number of files as it avoids having too many progress bar on screen. a high number of files as it avoids having too many progress bar on screen.
- `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer` - `add_prefix_space` option of the `ByteLevel` `PreTokenizer` has been moved to a `Normalizer`
- Added the `ByteLevel` `PostProcessor` to take care of fixing the offsets when a unicode character
gets split up as multiple byte-level characters.
## How to migrate: ## How to migrate:
- Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`. - Use the `ByteLevel` `Normalizer` with `add_prefix_space=True` in addition to the `PreTokenizer`.
The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets The `PreTokenizer` does not handle this option anymore. This fixes some issues with the offsets
being wrong if this option was on. being wrong if this option was on.
- Add the `ByteLevel` `PostProcessor` to your byte-level BPE tokenizers.
# v0.6.0 # v0.6.0

View File

@@ -6,6 +6,8 @@ a high number of files as it avoids having too many progress bar on screen.
- Improve BPE and WordPiece builders. - Improve BPE and WordPiece builders.
- `ByteLevel` is also a `Normalizer` and handles the `add_prefix_space` option at this level now. - `ByteLevel` is also a `Normalizer` and handles the `add_prefix_space` option at this level now.
This fixes some issues with the offsets being wrong if this option was on. This fixes some issues with the offsets being wrong if this option was on.
- `ByteLevel` is also a `PostProcessor` now and handles fixing the offsets when a unicode
character get split up in a byte-level character.
## How to migrate: ## How to migrate:
- Use the `ByteLevel` as a `Normalizer` if `add_prefix_space` is required. - Use the `ByteLevel` as a `Normalizer` if `add_prefix_space` is required.