Fix LongestFirst truncation strategy

2025-12-08 13:48:19 +00:00 · 2020-02-29 16:26:13 -05:00
parent 2f85ba21e6
commit f8f0702d98
4 changed files with 13 additions and 11 deletions
--- a/bindings/python/CHANGELOG.md
+++ b/bindings/python/CHANGELOG.md
@@ -4,6 +4,7 @@ Fixes:
 - Some default tokens were missing from `BertWordPieceTokenizer` (cf [#160](https://github.com/huggingface/tokenizers/issues/160))
 - There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up
 in multiple bytes. (cf [#156](https://github.com/huggingface/tokenizers/pull/156))
+- The `longest_first` truncation strategy had a bug ([#174](https://github.com/huggingface/tokenizers/issues/174))

 # v0.5.2
 - Do not open all files directly while training ([#163](https://github.com/huggingface/tokenizers/issues/163))