mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-05 20:28:22 +00:00
Update CHANGELOGs
This commit is contained in:
@@ -13,6 +13,7 @@ This adds some methods to easily save/load an entire tokenizer (`from_str`, `fro
|
||||
### Added
|
||||
- [#272]: Serialization of the `Tokenizer` and all the parts (`PreTokenizer`, `Normalizer`, ...).
|
||||
This adds some methods to easily save/load an entire tokenizer (`from_str`, `from_file`).
|
||||
- [#273]: `Tokenizer` and its parts are now pickable
|
||||
|
||||
### Changed
|
||||
- Improved errors generated during truncation: When the provided max length is too low are
|
||||
@@ -178,6 +179,8 @@ delimiter (Works like `.split(delimiter)`)
|
||||
- Fix a bug with the IDs associated with added tokens.
|
||||
- Fix a bug that was causing crashes in Python 3.5
|
||||
|
||||
[#273]: https://github.com/huggingface/tokenizers/pull/273
|
||||
[#272]: https://github.com/huggingface/tokenizers/pull/272
|
||||
[#249]: https://github.com/huggingface/tokenizers/pull/249
|
||||
[#239]: https://github.com/huggingface/tokenizers/pull/239
|
||||
[#236]: https://github.com/huggingface/tokenizers/pull/236
|
||||
|
||||
@@ -105,6 +105,7 @@ advised, but that's not the question)
|
||||
split up in multiple bytes
|
||||
- [#174]: The `LongestFirst` truncation strategy had a bug
|
||||
|
||||
[#272]: https://github.com/huggingface/tokenizers/pull/272
|
||||
[#249]: https://github.com/huggingface/tokenizers/pull/249
|
||||
[b770f36]: https://github.com/huggingface/tokenizers/commit/b770f364280af33efeffea8f0003102cda8cf1b7
|
||||
[#236]: https://github.com/huggingface/tokenizers/pull/236
|
||||
|
||||
Reference in New Issue
Block a user