Update CHANGELOGs

This commit is contained in:
Anthony MOI
2020-06-23 13:32:21 -04:00
parent aa3b39f692
commit f8b1630aa6
2 changed files with 9 additions and 0 deletions

View File

@@ -18,6 +18,10 @@ This adds some methods to easily save/load an entire tokenizer (`from_str`, `fro
activation of the Tensor Cores, while ensuring padding to a multiple of 8. Use with
`enable_padding(pad_to_multiple_of=8)` for example.
- [#298]: Ability to get the currently set truncation/padding params
- [#311]: Ability to enable/disable the parallelism using the `TOKENIZERS_PARALLELISM` environment
variable. This is especially usefull when using `multiprocessing` capabilities, with the `fork`
start method, which happens to be the default on Linux systems. Without disabling the parallelism,
the process dead-locks while encoding. (Cf [#187] for more information)
### Changed
- Improved errors generated during truncation: When the provided max length is too low are
@@ -190,6 +194,7 @@ delimiter (Works like `.split(delimiter)`)
- Fix a bug with the IDs associated with added tokens.
- Fix a bug that was causing crashes in Python 3.5
[#311]: https://github.com/huggingface/tokenizers/pull/311
[#309]: https://github.com/huggingface/tokenizers/pull/309
[#289]: https://github.com/huggingface/tokenizers/pull/289
[#286]: https://github.com/huggingface/tokenizers/pull/286
@@ -207,6 +212,7 @@ delimiter (Works like `.split(delimiter)`)
[#193]: https://github.com/huggingface/tokenizers/pull/193
[#190]: https://github.com/huggingface/tokenizers/pull/190
[#188]: https://github.com/huggingface/tokenizers/pull/188
[#187]: https://github.com/huggingface/tokenizers/issues/187
[#175]: https://github.com/huggingface/tokenizers/issues/175
[#174]: https://github.com/huggingface/tokenizers/issues/174
[#165]: https://github.com/huggingface/tokenizers/pull/165

View File

@@ -43,6 +43,8 @@ using serde. It is now easy to save/load an entire tokenizer.
- [#289]: Ability to pad to a multiple of a specified value. This is especially useful to ensure
activation of the Tensor Cores, while ensuring padding to a multiple of 8.
- [#298]: Ability to get the currently set truncation/padding params
- [#311]: Ability to enable/disable the parallelism using the `TOKENIZERS_PARALLELISM` environment
variable.
### How to migrate
- Replace any `XXX_to_YYY_offsets()` method call by any of the new ones.
@@ -117,6 +119,7 @@ advised, but that's not the question)
split up in multiple bytes
- [#174]: The `LongestFirst` truncation strategy had a bug
[#311]: https://github.com/huggingface/tokenizers/pull/311
[#309]: https://github.com/huggingface/tokenizers/pull/309
[#298]: https://github.com/huggingface/tokenizers/pull/298
[#289]: https://github.com/huggingface/tokenizers/pull/289