Update changelogs and bump version for python release

2025-12-17 01:28:46 +00:00 · 2020-06-03 18:27:29 -04:00
parent 950b23c89b
commit d00ac60162
6 changed files with 17 additions and 8 deletions
--- a/bindings/python/CHANGELOG.md
+++ b/bindings/python/CHANGELOG.md
@@ -4,16 +4,18 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-## [0.8.0.dev1]
+## [0.8.0.dev2]
-### Added
+### Fixed
- [#272]: Serialization of the `Tokenizer` and all the parts (`PreTokenizer`, `Normalizer`, ...).
+- [#286]: Fix various crash when training a BPE model
 This adds some methods to easily save/load an entire tokenizer (`from_str`, `from_file`).
 ### Added
 - [#272]: Serialization of the `Tokenizer` and all the parts (`PreTokenizer`, `Normalizer`, ...).
 This adds some methods to easily save/load an entire tokenizer (`from_str`, `from_file`).
 - [#273]: `Tokenizer` and its parts are now pickable
 - [#289]: Ability to pad to a multiple of a specified value. This is especially useful to ensure
 activation of the Tensor Cores, while ensuring padding to a multiple of 8. Use with
 `enable_padding(pad_to_multiple_of=8)` for example.
 ### Changed
 - Improved errors generated during truncation: When the provided max length is too low are
@@ -183,6 +185,8 @@ delimiter (Works like `.split(delimiter)`)
 - Fix a bug with the IDs associated with added tokens.
 - Fix a bug that was causing crashes in Python 3.5
 [#289]: https://github.com/huggingface/tokenizers/pull/289
 [#286]: https://github.com/huggingface/tokenizers/pull/286
 [#280]: https://github.com/huggingface/tokenizers/pull/280
 [#276]: https://github.com/huggingface/tokenizers/pull/276
 [#273]: https://github.com/huggingface/tokenizers/pull/273
--- a/bindings/python/Cargo.lock
+++ b/bindings/python/Cargo.lock
@@ -622,7 +622,7 @@ dependencies = [
 [[package]]
 name = "tokenizers-python"
-version = "0.8.0-dev1"
+version = "0.8.0-dev2"
 dependencies = [
 "pyo3 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)",
 "rayon 1.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
--- a/bindings/python/Cargo.toml
+++ b/bindings/python/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tokenizers-python"
-version = "0.8.0-dev1"
+version = "0.8.0-dev2"
 authors = ["Anthony MOI <m.anthony.moi@gmail.com>"]
 edition = "2018"
--- a/bindings/python/setup.py
+++ b/bindings/python/setup.py
@@ -6,7 +6,7 @@ extras["testing"] = ["pytest"]
 setup(
    name="tokenizers",
-    version="0.8.0.dev1",
+    version="0.8.0.dev2",
    description="Fast and Customizable Tokenizers",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
--- a/bindings/python/tokenizers/init.py
+++ b/bindings/python/tokenizers/init.py
@@ -1,4 +1,4 @@
-__version__ = "0.8.0.dev1"
+__version__ = "0.8.0.dev2"
 from typing import Tuple, Union, Tuple, List
--- a/tokenizers/CHANGELOG.md
+++ b/tokenizers/CHANGELOG.md
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 - [#236]: Fix a bug with offsets being shifted when there are sub-sequences (Usually with
 special tokens and/or added tokens in the sequence).
 - [#286]: Fix various crash when training a BPE model
 ### Changed
 - [#234]: Completely changed the alignement mappings available on `Encoding`. Previous mappings
@@ -35,6 +36,8 @@ implementation from GPT-2
 on this front.
 - [#272]: Serialization of the `Tokenizer` and all the parts (`PreTokenizer`, `Normalizer`, ...)
 using serde. It is now easy to save/load an entire tokenizer.
 - [#289]: Ability to pad to a multiple of a specified value. This is especially useful to ensure
 activation of the Tensor Cores, while ensuring padding to a multiple of 8.
 ### How to migrate
 - Replace any `XXX_to_YYY_offsets()` method call by any of the new ones.
@@ -109,6 +112,8 @@ advised, but that's not the question)
 split up in multiple bytes
 - [#174]: The `LongestFirst` truncation strategy had a bug
 [#289]: https://github.com/huggingface/tokenizers/pull/289
 [#286]: https://github.com/huggingface/tokenizers/pull/286
 [#280]: https://github.com/huggingface/tokenizers/pull/280
 [#276]: https://github.com/huggingface/tokenizers/pull/276
 [#272]: https://github.com/huggingface/tokenizers/pull/272
`@@ -1,4 +1,4 @@`
	`__version__ = "0.8.0.dev1"`	`__version__ = "0.8.0.dev2"`

	`from typing import Tuple, Union, Tuple, List`	`from typing import Tuple, Union, Tuple, List`