Python - Bump version for 0.8.0.transformers release

2025-12-03 11:18:29 +00:00 · 2020-06-26 14:37:22 -04:00
parent 6d531a435e
commit 1a08b21329
5 changed files with 20 additions and 5 deletions
--- a/bindings/python/CHANGELOG.md
+++ b/bindings/python/CHANGELOG.md
@@ -4,7 +4,22 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [0.8.0.rc3]
+## [0.8.0]
+
+### Highlights of this release
+- We can now encode both pre-tokenized inputs, and raw strings. This is especially usefull when
+processing datasets that are already pre-tokenized like for NER (Name Entity Recognition), and helps
+while applying labels to each word.
+- Full tokenizer serialization. It is now easy to save a tokenizer to a single JSON file, to later
+load it back with just one line of code. That's what sharing a Tokenizer means now: 1 line of code.
+- With the serialization comes the compatibility with `Pickle`! The Tokenizer, all of its components,
+Encodings, everything can be pickled!
+- Training a tokenizer is now even faster (up to 5-10x) than before!
+- Compatibility with `multiprocessing`, even when using the `fork` start method. Since this library
+makes heavy use of the multithreading capacities of our computers to allows a very fast tokenization,
+this led to problems (deadlocks) when used with `multiprocessing`. This version now allows to
+disable the parallelism, and will warn you if this is necessary.
+- And a lot of other improvements, and fixes.

 ### Fixed
 - [#286]: Fix various crash when training a BPE model
--- a/bindings/python/Cargo.lock
+++ b/bindings/python/Cargo.lock
@@ -641,7 +641,7 @@ dependencies = [

 [[package]]
 name = "tokenizers-python"
-version = "0.8.0-rc3"
+version = "0.8.0"
 dependencies = [
 "libc 0.2.68 (registry+https://github.com/rust-lang/crates.io-index)",
 "pyo3 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)",
--- a/bindings/python/Cargo.toml
+++ b/bindings/python/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tokenizers-python"
-version = "0.8.0-rc3"
+version = "0.8.0"
 authors = ["Anthony MOI <m.anthony.moi@gmail.com>"]
 edition = "2018"

--- a/bindings/python/setup.py
+++ b/bindings/python/setup.py
@@ -6,7 +6,7 @@ extras["testing"] = ["pytest"]

 setup(
    name="tokenizers",
-    version="0.8.0.rc3",
+    version="0.8.0.transformers",
    description="Fast and Customizable Tokenizers",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
--- a/bindings/python/tokenizers/init.py
+++ b/bindings/python/tokenizers/init.py
@@ -1,4 +1,4 @@
-__version__ = "0.8.0.rc3"
+__version__ = "0.8.0"

 from typing import Tuple, Union, Tuple, List