mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-06 20:58:22 +00:00
Update CHANGELOGs
This commit is contained in:
@@ -28,6 +28,8 @@ whitespaces are part of the actual token.
|
||||
It has been added to `ByteLevelBPETokenizer` but it is off by default (`trim_offsets=False`).
|
||||
- More alignment mappings on the `Encoding`.
|
||||
- `post_process` can be called on the `Tokenizer`
|
||||
- [#208]: Ability to retrieve the vocabulary from the `Tokenizer` with
|
||||
`get_vocab(with_added_tokens: bool)`
|
||||
|
||||
### Fixed
|
||||
- [#193]: Fix some issues with the offsets being wrong with the `ByteLevel` BPE:
|
||||
@@ -148,6 +150,7 @@ delimiter (Works like `.split(delimiter)`)
|
||||
- Fix a bug with the IDs associated with added tokens.
|
||||
- Fix a bug that was causing crashes in Python 3.5
|
||||
|
||||
[#208]: https://github.com/huggingface/tokenizers/pull/208
|
||||
[#197]: https://github.com/huggingface/tokenizers/pull/197
|
||||
[#193]: https://github.com/huggingface/tokenizers/pull/193
|
||||
[#190]: https://github.com/huggingface/tokenizers/pull/190
|
||||
|
||||
Reference in New Issue
Block a user