tokenizers

mirror of https://github.com/mii443/tokenizers.git synced 2025-08-22 16:25:30 +00:00

Files

Nicolas Patry 25aee8b88c [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder (#1513 )

* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the
decoder

Causes issues with `ByteLevel` messing up some `AddedTokens` with some
utf-8 range used in the bytelevel mapping.

This commit tests the extend of the damage of ignoring the decoder for
those tokens.

* Format.

* Installing cargo audit.

* Minor fix.

* Fixing "bug" in node/python.

* Autoformat.

* Clippy.

* Only prefix space when there's no decoder.

2024-05-06 11:49:38 +02:00

conda

Move to maturing mimicking move for safetensors. + Rewritten node bindings. (#1331 )

2023-08-28 16:24:14 +02:00

workflows

[BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder (#1513 )

2024-05-06 11:49:38 +02:00

stale.yml

Adding stale bot ? (#1123 )

2022-12-19 13:50:48 +01:00