Commit Graph

  • dc78c46c39 fix esaxx main mii443 2025-06-12 15:32:38 +09:00
  • 909fdde2a4 Upgrade onig, to get it compiling with GCC 15 (#1771) Owen Shepherd 2025-05-27 16:19:15 +01:00
  • b4d8dfc3b2 Use ApiBuilder::from_env() in from_pretrained function (#1737) benshi 2025-05-27 18:20:17 +08:00
  • e5d781d5b9 update pyo3 and rust-numpy depends for no-gil/free-threading compat (#1774) Qubitium-ModelCloud 2025-05-27 17:31:58 +08:00
  • 01f8bc834c clippy (#1781) Arthur 2025-05-27 11:30:32 +02:00
  • 23e7e42adf Fix data path in test_continuing_prefix_trainer_mismatch (#1747) Gaétan Lepage 2025-05-27 08:48:27 +02:00
  • fd1b361b76 Bump http-proxy-middleware in /tokenizers/examples/unstable_wasm/www (#1762) dependabot[bot] 2025-05-27 08:29:50 +02:00
  • cc01186fd7 Fix type notation of merges in BPE Python binding (#1766) Kokū 2025-05-27 15:23:58 +09:00
  • f1faec1756 Fix typos in strings and comments (#1770) co63oc 2025-05-27 14:17:36 +08:00
  • 67db0cd1dd Fix no-onig no-wasm builds (#1772) Owen Shepherd 2025-05-27 06:44:20 +01:00
  • 759d7aa77a replace lazy_static with stabilized std::sync::LazyLock in 1.80 (#1739) sftse 2025-03-18 17:33:44 +01:00
  • 4383a25787 Update the release builds following 0.21.1. (#1746) Nicolas Patry 2025-03-13 13:01:41 +01:00
  • 4f1a810aa2 Add rustls-tls feature (#1732) Victoria Terenina 2025-02-11 09:57:05 +00:00
  • fbe3365a13 Update metadata as Python3.7 and Python3.8 support was dropped (#1724) Nighthawk 2025-02-11 17:52:59 +08:00
  • c45aebd102 🚨 Support updating template processors (#1652) Arthur 2025-01-28 14:58:35 +01:00
  • e7ed39de3c Fixing NormalizedString append when normalized is empty. (#1717) Nicolas Patry 2025-01-09 17:41:32 +01:00
  • 0ff2ab0f64 Fixing the stream by removing the read_index altogether. (#1716) Nicolas Patry 2025-01-09 17:41:15 +01:00
  • 862d1a346a Fix panic in DecodeStream::step due to incorrect index usage (#1699) Sungyoon Jeong 2025-01-09 21:24:04 +09:00
  • c04b97aab1 Update documentation of Rust feature (#1711) sondalex 2025-01-09 12:08:45 +01:00
  • bdfc38b78d Fix typos (#1715) tinyboxvk 2025-01-09 06:53:20 -04:00
  • 6945933829 update Split pretokenizer docstrings (#1701) Dylan-Harden3 2025-01-08 05:35:52 -06:00
  • 166edd87c8 Fixing the README. (#1714) Nicolas Patry 2025-01-08 12:31:17 +01:00
  • 3a6504d274 Upgrade to PyO3 0.23 (#1708) Nicolas Patry 2024-12-31 18:36:01 +01:00
  • 555d44c47a Add feature flag hint to README.md, fixes #1633 (#1709) sftse 2024-12-30 17:01:53 +01:00
  • 24d29f498d Update dev version and pyproject.toml (#1693) Arthur 2024-11-27 16:01:48 +01:00
  • 1bf2a66b80 v0.20.4-dev0 Arthur Zucker 2024-11-27 10:07:49 +01:00
  • eb4cc86d4e Bump cross-spawn from 6.0.5 to 6.0.6 in /bindings/node (#1687) dependabot[bot] 2024-11-25 10:04:06 +01:00
  • ac34660e44 Fix encode_batch and encode_batch_fast to accept ndarrays again (#1679) Dimitris Iliopoulos 2024-11-21 05:55:11 -05:00
  • f0c48bd89a Update README.md with install from source Arthur 2024-11-15 21:51:39 +01:00
  • cc5fb01a2f Decode stream python (#1678) Nicolas Patry 2024-11-15 19:06:22 +08:00
  • 500db282a8 Adding an API for decode streaming. (#1677) Nicolas Patry 2024-11-15 13:02:38 +08:00
  • f4c9fd7f40 Testing ABI3 wheels to reduce number of wheels (#1674) Nicolas Patry 2024-11-15 13:02:22 +08:00
  • 5aa9f6cff0 Disable caching for long strings. (#1676) Nicolas Patry 2024-11-07 21:36:27 +08:00
  • c6b5c3eab7 More cache options. (#1675) Nicolas Patry 2024-11-06 18:12:09 +08:00
  • 1740bff7a6 Revert "Upgrade python versions." Nicolas Patry 2024-11-06 13:18:03 +08:00
  • b81ec467a6 Upgrade python versions. Nicolas Patry 2024-11-06 13:17:22 +08:00
  • 57884ebaa2 [MINOR:TYPO] Fix docstrings (#1653) Christopher Akiki 2024-11-05 16:25:06 +01:00
  • 5e223ceb48 fix pylist (#1673) Arthur 2024-11-05 16:24:23 +01:00
  • 0f3a3f957e update workflow Arthur Zucker 2024-11-04 18:38:32 +01:00
  • 7c36735389 v0.20.2-dev.0 version Arthur Zucker 2024-11-04 18:36:40 +01:00
  • 6c15458868 Bump actions versions (#1669) tinyboxvk 2024-11-01 06:19:35 -03:00
  • 6ade8c2d21 PyO3 0.22 (#1665) Dimitris Iliopoulos 2024-11-01 05:17:23 -04:00
  • 41e0eaa561 Bump actions/checkout to v4 (#1667) tinyboxvk 2024-10-29 10:32:07 -03:00
  • 5512a424bf Add safety comments (#1651) Manish Goregaokar 2024-10-29 01:44:06 -07:00
  • 6ea758872d Unsound call of set_var (#1664) sftse 2024-10-25 15:44:30 +02:00
  • a8738a95d1 Arg name correction: auth_token -> token (#1621) rravenel 2024-10-24 07:32:09 -07:00
  • 9b77c054ef Fix off-by-one error in tokenizer::normalizer::Range::len (#1638) Ryan Landay 2024-10-14 02:40:17 -04:00
  • bce68a60cb Bump cookie and express in /tokenizers/examples/unstable_wasm/www (#1648) dependabot[bot] 2024-10-10 15:30:24 +02:00
  • 51826532d4 push new dev version Arthur Zucker 2024-10-10 12:00:16 +02:00
  • 557fde76d8 style: simplify string formatting for readability (#1632) Hamir Mahal 2024-10-04 04:11:50 -07:00
  • 3d51a1695f Fix documentation build (#1642) Arthur 2024-10-01 14:48:02 +02:00
  • 294ab86fe0 Bump webpack in /tokenizers/examples/unstable_wasm/www (#1641) dependabot[bot] 2024-10-01 14:17:23 +02:00
  • 2b97a5e49e Bump send and express in /tokenizers/examples/unstable_wasm/www (#1631) dependabot[bot] 2024-10-01 14:17:09 +02:00
  • 077678d1d1 Bump serve-static and express in /tokenizers/examples/unstable_wasm/www (#1630) dependabot[bot] 2024-10-01 14:16:53 +02:00
  • 2204066e78 Bump body-parser and express in /tokenizers/examples/unstable_wasm/www (#1629) dependabot[bot] 2024-10-01 14:16:41 +02:00
  • 3fb1371c1c [ignore_merges] Fix offsets (#1640) Arthur 2024-10-01 09:22:20 +02:00
  • b4a38c4f63 Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows (#1626) dependabot[bot] 2024-09-30 16:38:28 +02:00
  • 14a07b06e4 fix filelink (#1610) 152334H 2024-08-12 05:35:33 +00:00
  • 75aef5b75b Update README.md (#1608) Arthur 2024-08-09 10:40:21 +02:00
  • 81c471cf17 update dev version 0.20.0 Arthur Zucker 2024-08-08 18:10:55 +02:00
  • 85cc05a32f Fix CI (#1607) Nicolas Patry 2024-08-08 17:09:30 +02:00
  • bfd9cdeefb Perf improvement 16% by removing offsets. (#1587) Nicolas Patry 2024-08-08 14:56:13 +02:00
  • bd27fa56d6 add deserialize for pre tokenizers (#1603) Arthur 2024-08-08 08:38:09 +02:00
  • 56c9c70440 Tests + Deserialization improvement for normalizers. (#1604) Nicolas Patry 2024-08-08 08:38:02 +02:00
  • 49dafd707e Fix strip python type (#1602) Arthur 2024-08-07 15:36:28 +02:00
  • bded212356 Support None to reset pre_tokenizers and normalizers, and index sequences (#1590) Arthur 2024-08-07 12:52:35 +02:00
  • eea8e1ae6f Fix doc about split (#1591) Arthur 2024-08-07 12:35:01 +02:00
  • 6a5fce9fa0 Merges cannot handle tokens containing spaces. (#909) Nicolas Patry 2024-08-07 12:34:53 +02:00
  • ab9c7ded8b Using serde (serde_pyo3) to get __str__ and __repr__ easily. (#1588) Nicolas Patry 2024-08-07 12:08:29 +02:00
  • 7a30bca2f3 Updating error messages. (#1599) Nicolas Patry 2024-08-06 16:42:56 +02:00
  • 8f2cc90249 Add test normalizers (#1600) Arthur 2024-08-06 16:08:18 +02:00
  • fe41687ca8 Better serialization error (#1595) Nicolas Patry 2024-08-06 13:39:11 +02:00
  • 2d27761f60 Adding a few tests for decoder deserialization. Nicolas Patry 2024-08-06 12:03:21 +02:00
  • adc82cb49a Add-legacy-tests (#1597) Arthur 2024-08-06 13:08:12 +02:00
  • 99a48dcb46 Clippy. Nicolas Patry 2024-08-06 10:08:35 +02:00
  • 5fb8a2320c Legacy test. Nicolas Patry 2024-08-06 09:58:33 +02:00
  • 388014fd6b Adding some serialization testing around the wrapper. Nicolas Patry 2024-08-06 09:55:01 +02:00
  • 7b80359dd2 Fixing release CI strict (taken from safetensors). Nicolas Patry 2024-08-05 17:24:01 +02:00
  • a010f6b75c Revert "Using serde (serde_pyo3) to get __str__ and __repr__ easily." Nicolas Patry 2024-08-02 18:42:57 +02:00
  • 86138337fc Using serde (serde_pyo3) to get __str__ and __repr__ easily. Nicolas Patry 2024-08-02 18:41:54 +02:00
  • 7415e28536 Enabling the option to use fancy_regex instead of onig. Nicolas Patry 2024-08-01 12:10:16 +02:00
  • 9e0c791f2b Small performance fixup (negligible but obviously better). Nicolas Patry 2024-08-01 11:34:02 +02:00
  • 1df498a186 Fixing benchmark2. Nicolas Patry 2024-08-01 11:31:31 +02:00
  • c6f2c0b057 Fixing the benchmark. (#1583) Nicolas Patry 2024-08-01 10:36:53 +02:00
  • 35f338a7b8 Add benchmark vs tiktoken (#1582) Nicolas Patry 2024-07-31 17:09:23 +02:00
  • aface7a968 dump spm_precompiled to 0.1.3 (#1571) Mike 2024-07-31 13:38:04 +00:00
  • a3ad85b3e8 Fix clippy + feature test management. (#1580) Nicolas Patry 2024-07-26 12:16:30 +02:00
  • 4ea2f235b0 Add bytelevel normalizer to fix decode when adding tokens to BPE (#1555) Arthur 2024-07-15 12:12:03 +02:00
  • f2a44dc5d1 Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) … (#1569) Arthur 2024-07-12 07:29:40 +02:00
  • fdd26ba9a3 Enable dropout = 0.0 as an equivalent to none in BPE (#1550) Marco 2024-06-24 19:36:11 +09:00
  • 9441f7e8f7 make sure we don't warn on empty tokens (#1554) Arthur 2024-06-20 14:33:21 +02:00
  • 3e736bbccb Fix clippy Arthur Zucker 2024-06-20 09:39:19 +02:00
  • 1ff56c0c70 Fix 'dictionnary' typo (#1511) Nathan 2024-06-11 06:43:47 -07:00
  • 88f51fe7d2 Switch from cached_download to hf_hub_download in tests (#1547) Lucain 2024-06-11 15:26:58 +02:00
  • 418c35c09e feat(ci): add trufflehog secrets detection (#1551) Luc Georges 2024-06-10 16:10:23 +02:00
  • 8d28dbefd1 Fixing for clippy 1.78 (#1548) Nicolas Patry 2024-06-06 13:18:59 +02:00
  • bfefcf676d Make USED_PARALLELISM atomic (#1532) nathaniel-daniel 2024-06-06 04:02:26 -07:00
  • 25aee8b88c [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder (#1513) Nicolas Patry 2024-05-06 11:49:38 +02:00
  • f2ec3b239b remove enforcement of non special when adding tokens (#1521) Arthur 2024-04-30 15:53:47 +02:00
  • 71c2a8d01a update dev version so 0.19.1 Arthur Zucker 2024-04-17 23:17:12 +02:00