mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-03 19:28:20 +00:00
Update CHANGELOGs
This commit is contained in:
@@ -18,6 +18,10 @@ This adds some methods to easily save/load an entire tokenizer (`from_str`, `fro
|
|||||||
activation of the Tensor Cores, while ensuring padding to a multiple of 8. Use with
|
activation of the Tensor Cores, while ensuring padding to a multiple of 8. Use with
|
||||||
`enable_padding(pad_to_multiple_of=8)` for example.
|
`enable_padding(pad_to_multiple_of=8)` for example.
|
||||||
- [#298]: Ability to get the currently set truncation/padding params
|
- [#298]: Ability to get the currently set truncation/padding params
|
||||||
|
- [#311]: Ability to enable/disable the parallelism using the `TOKENIZERS_PARALLELISM` environment
|
||||||
|
variable. This is especially usefull when using `multiprocessing` capabilities, with the `fork`
|
||||||
|
start method, which happens to be the default on Linux systems. Without disabling the parallelism,
|
||||||
|
the process dead-locks while encoding. (Cf [#187] for more information)
|
||||||
|
|
||||||
### Changed
|
### Changed
|
||||||
- Improved errors generated during truncation: When the provided max length is too low are
|
- Improved errors generated during truncation: When the provided max length is too low are
|
||||||
@@ -190,6 +194,7 @@ delimiter (Works like `.split(delimiter)`)
|
|||||||
- Fix a bug with the IDs associated with added tokens.
|
- Fix a bug with the IDs associated with added tokens.
|
||||||
- Fix a bug that was causing crashes in Python 3.5
|
- Fix a bug that was causing crashes in Python 3.5
|
||||||
|
|
||||||
|
[#311]: https://github.com/huggingface/tokenizers/pull/311
|
||||||
[#309]: https://github.com/huggingface/tokenizers/pull/309
|
[#309]: https://github.com/huggingface/tokenizers/pull/309
|
||||||
[#289]: https://github.com/huggingface/tokenizers/pull/289
|
[#289]: https://github.com/huggingface/tokenizers/pull/289
|
||||||
[#286]: https://github.com/huggingface/tokenizers/pull/286
|
[#286]: https://github.com/huggingface/tokenizers/pull/286
|
||||||
@@ -207,6 +212,7 @@ delimiter (Works like `.split(delimiter)`)
|
|||||||
[#193]: https://github.com/huggingface/tokenizers/pull/193
|
[#193]: https://github.com/huggingface/tokenizers/pull/193
|
||||||
[#190]: https://github.com/huggingface/tokenizers/pull/190
|
[#190]: https://github.com/huggingface/tokenizers/pull/190
|
||||||
[#188]: https://github.com/huggingface/tokenizers/pull/188
|
[#188]: https://github.com/huggingface/tokenizers/pull/188
|
||||||
|
[#187]: https://github.com/huggingface/tokenizers/issues/187
|
||||||
[#175]: https://github.com/huggingface/tokenizers/issues/175
|
[#175]: https://github.com/huggingface/tokenizers/issues/175
|
||||||
[#174]: https://github.com/huggingface/tokenizers/issues/174
|
[#174]: https://github.com/huggingface/tokenizers/issues/174
|
||||||
[#165]: https://github.com/huggingface/tokenizers/pull/165
|
[#165]: https://github.com/huggingface/tokenizers/pull/165
|
||||||
|
|||||||
@@ -43,6 +43,8 @@ using serde. It is now easy to save/load an entire tokenizer.
|
|||||||
- [#289]: Ability to pad to a multiple of a specified value. This is especially useful to ensure
|
- [#289]: Ability to pad to a multiple of a specified value. This is especially useful to ensure
|
||||||
activation of the Tensor Cores, while ensuring padding to a multiple of 8.
|
activation of the Tensor Cores, while ensuring padding to a multiple of 8.
|
||||||
- [#298]: Ability to get the currently set truncation/padding params
|
- [#298]: Ability to get the currently set truncation/padding params
|
||||||
|
- [#311]: Ability to enable/disable the parallelism using the `TOKENIZERS_PARALLELISM` environment
|
||||||
|
variable.
|
||||||
|
|
||||||
### How to migrate
|
### How to migrate
|
||||||
- Replace any `XXX_to_YYY_offsets()` method call by any of the new ones.
|
- Replace any `XXX_to_YYY_offsets()` method call by any of the new ones.
|
||||||
@@ -117,6 +119,7 @@ advised, but that's not the question)
|
|||||||
split up in multiple bytes
|
split up in multiple bytes
|
||||||
- [#174]: The `LongestFirst` truncation strategy had a bug
|
- [#174]: The `LongestFirst` truncation strategy had a bug
|
||||||
|
|
||||||
|
[#311]: https://github.com/huggingface/tokenizers/pull/311
|
||||||
[#309]: https://github.com/huggingface/tokenizers/pull/309
|
[#309]: https://github.com/huggingface/tokenizers/pull/309
|
||||||
[#298]: https://github.com/huggingface/tokenizers/pull/298
|
[#298]: https://github.com/huggingface/tokenizers/pull/298
|
||||||
[#289]: https://github.com/huggingface/tokenizers/pull/289
|
[#289]: https://github.com/huggingface/tokenizers/pull/289
|
||||||
|
|||||||
Reference in New Issue
Block a user