Commit Graph

  • 7733bc25d6 add serialization for ignore_merges (#1504) Arthur 2024-04-17 21:56:48 +02:00
  • 91393ef75e Fixing doc. (#1499) Nicolas Patry 2024-04-17 09:32:40 +02:00
  • 949d9e3e0e Bumping all versions 3 times (ty transformers :) ) (#1498) Nicolas Patry 2024-04-16 15:58:36 +02:00
  • e0defa7355 Remove 3.13 (potential undefined behavior.) (#1497) Nicolas Patry 2024-04-16 15:56:47 +02:00
  • d5a8cc7a49 PyO3 0.21. (#1494) Nicolas Patry 2024-04-16 13:49:52 +02:00
  • 914576f7ed Add more support for tiktoken based tokenizers (#1493) Arthur 2024-04-15 17:26:36 +02:00
  • 6e58f838b3 version = "0.16.0-dev.0" Arthur Zucker 2024-04-02 09:51:14 +02:00
  • 09069717e9 Refactor metaspace (#1476) Arthur 2024-03-30 10:27:24 +01:00
  • 6153126b22 Added ability to inspect a 'Sequence' decoder and the AddedVocabulary. (#1443) Anthony Platanios 2024-03-29 16:29:54 -07:00
  • d8c4388166 Bump ip from 2.0.0 to 2.0.1 in /bindings/node (#1456) dependabot[bot] 2024-03-25 11:29:36 +01:00
  • 29fef1e7aa [remove black] And use ruff (#1436) Arthur 2024-03-12 21:24:21 +11:00
  • 72a1973cd1 chore: Remove CLI - this was originally intended for local development (#1442) Bryant Biggs 2024-02-12 22:05:43 -05:00
  • 7f49f20ab0 version = "0.15.3-dev-0” Arthur Zucker 2024-02-12 09:48:00 +09:00
  • c893204c45 Efficient Replace normalizer (#1413) Rasmus Larsen 2024-02-06 14:36:44 +01:00
  • 4a8105c366 Convert word counts to u64 (#1433) Stephen Roller 2024-02-05 21:39:12 -05:00
  • 67fe59c88d chore: Update dependencies to latest supported versions (#1441) Bryant Biggs 2024-01-22 11:54:37 -05:00
  • 8f73fe9515 update dev version to 0.15.2-dev.0 Arthur Zucker 2024-01-22 15:34:57 +01:00
  • accd0650b8 Update release for python3.12 windows (#1438) Arthur 2024-01-19 15:56:47 +01:00
  • 6a77d4859b Encode special tokens (#1437) Arthur 2024-01-19 12:43:43 +01:00
  • 888dd4bc65 pyo3: update to 0.20 (#1386) Michael Lui 2024-01-11 06:03:13 -10:00
  • 8939d4e26d Bump follow-redirects in /tokenizers/examples/unstable_wasm/www (#1430) dependabot[bot] 2024-01-10 12:04:48 +01:00
  • 43b31a83c7 Fix make bench. (#1428) Nicolas Patry 2024-01-08 09:53:51 +01:00
  • f1c23b8680 Add quick doc to byte_level.rs (#1420) Steven Weiss 2024-01-03 01:25:07 -08:00
  • 11462596d1 Faster HF dataset iteration in docs (#1414) Mario Šaško 2023-12-14 16:12:56 +01:00
  • 8edec536a7 Fix doc links in readme (#1367) Pierric Cistac 2023-12-09 06:14:54 -05:00
  • 8f9b945c75 Stale bot. (#1404) Nicolas Patry 2023-12-05 14:11:37 +01:00
  • daf361676b Derive Clone on Tokenizer, add Encoding.into_tokens() method (#1381) Pete 2023-11-20 00:56:29 -08:00
  • e3bcef288b udpate to version = "0.15.1-dev0" (#1390) Arthur 2023-11-15 13:30:58 +01:00
  • f55822baea [pre_tokenizers] Fix sentencepiece based Metaspace (#1357) Arthur 2023-11-14 18:05:07 +01:00
  • ee2af9e99a Allow huggingface_hub<1.0 (#1385) Lucain 2023-11-10 13:51:07 +01:00
  • 648b33a09e Allow hf_hub 0.18 (#1383) Mario Šaško 2023-11-06 14:12:05 +01:00
  • c718c53bb9 Bump @babel/traverse from 7.22.11 to 7.23.2 in /bindings/node (#1370) dependabot[bot] 2023-10-25 08:14:32 +02:00
  • 985d49ae64 fix: remove useless token (#1371) Remy 2023-10-19 14:29:01 +02:00
  • 0d8c57da48 fix a clerical error in the comment (#1356) 天地 2023-10-11 03:31:44 +08:00
  • 4322056e6e Preparing release. (#1355) Nicolas Patry 2023-10-06 12:56:36 +02:00
  • aed491df8c Fixing the progressbar. (#1353) Nicolas Patry 2023-10-05 15:33:58 +02:00
  • 7e8e69a22c Let's allow hf_hub < 1.0 (#1344) Arthur 2023-10-02 14:30:10 +02:00
  • 18bd5e8f9d Added ability to inspect a 'Sequence' pre-tokenizer. (#1341) Anthony Platanios 2023-09-20 23:10:16 -07:00
  • 2c565e42c7 update package version for dev (#1339) Arthur 2023-09-07 16:19:24 +02:00
  • 3dce63f062 Merge pull request #1335 from ArthurZucker/update-added-tokens Arthur 2023-09-07 12:48:54 +02:00
  • efec086f35 get_added_tokens_decoder returns BTREEMap Arthur Zucker 2023-09-06 12:24:30 +00:00
  • a7ace4480d python stub.py Arthur Zucker 2023-09-05 17:33:14 +00:00
  • f435af8b71 linting Arthur Zucker 2023-09-05 16:43:06 +00:00
  • 26fdfc2bc3 style Arthur Zucker 2023-09-05 16:42:45 +00:00
  • b57e1c3f5d #[allow(dead_code)] // Suppress the "method is never used" warning Arthur Zucker 2023-09-05 16:42:22 +00:00
  • c3fa75fa0e nits Arthur Zucker 2023-09-05 15:40:13 +00:00
  • 08af8ea9c3 make tests happy Arthur Zucker 2023-09-05 15:37:09 +00:00
  • 531b06f6db update the get_vocab_size to compute actual length of the get_vocab function Arthur Zucker 2023-09-05 15:19:50 +00:00
  • f1da83f358 add support for get_added_tokens_decoder Arthur Zucker 2023-09-05 14:49:29 +00:00
  • e5fc051ad2 update Arthur Zucker 2023-09-05 13:34:43 +00:00
  • 93b37f36dc styling Arthur Zucker 2023-09-04 20:54:55 +00:00
  • 058e34b421 make special editable as well Arthur Zucker 2023-09-04 20:54:29 +00:00
  • 2291c89896 python stub.py Arthur Zucker 2023-09-04 19:49:36 +00:00
  • b235f85527 clippy Arthur Zucker 2023-09-04 19:31:48 +00:00
  • 9aab096da8 fmt Arthur Zucker 2023-09-04 19:31:05 +00:00
  • a59bb76aa1 update and todo Arthur Zucker 2023-09-04 19:21:38 +00:00
  • c599db1421 nits Arthur Zucker 2023-09-04 19:11:19 +00:00
  • d4008b0d7a cliipy Arthur Zucker 2023-09-04 19:11:05 +00:00
  • b117ac7f16 updates Arthur Zucker 2023-09-04 19:10:22 +00:00
  • a53dff9bc5 make content writable in python Arthur Zucker 2023-09-04 18:18:21 +00:00
  • d9829cdc6e fix more tests Arthur Zucker 2023-09-04 17:22:27 +00:00
  • 39bd27e673 fix build Arthur Zucker 2023-09-01 21:22:07 +00:00
  • 9f0c703f03 update init and src for bingings python Arthur Zucker 2023-09-01 21:07:01 +00:00
  • 587748ab09 clean derive partial eq Arthur Zucker 2023-09-01 20:50:34 +00:00
  • fdef4a118b fmt Arthur Zucker 2023-09-01 20:48:47 +00:00
  • d1566a9ecc update, // AddedTokens can be updated if value changed Arthur Zucker 2023-09-01 20:48:36 +00:00
  • 399c6fe852 fix and update tes Arthur Zucker 2023-09-01 20:40:06 +00:00
  • 2b72017e17 correctly compute the new id: we take the max of the AddedToken + get_vocab_size Arthur Zucker 2023-09-01 19:03:33 +00:00
  • db319492f7 clippy Arthur Zucker 2023-09-01 18:57:39 +00:00
  • 2dca476810 fix some tests Arthur Zucker 2023-09-01 18:48:50 +00:00
  • 6cca5716af fix one test? Arthur Zucker 2023-09-01 18:42:30 +00:00
  • 345b4eba96 updates Arthur Zucker 2023-09-01 18:41:36 +00:00
  • 8e522a38d9 Updating the docs with the new command. (#1333) Nicolas Patry 2023-08-29 13:15:26 +02:00
  • d2010d5165 Move to maturing mimicking move for safetensors. + Rewritten node bindings. (#1331) Nicolas Patry 2023-08-28 16:24:14 +02:00
  • f2952020d5 Python 38 arm (#1330) Nicolas Patry 2023-08-23 16:29:16 +02:00
  • f08058ab2b Reduce number of different revisions by 1 (#1329) Nicolas Patry 2023-08-23 15:57:36 +02:00
  • 6c350d88fe Re-using scritpts from safetensors. (#1328) Nicolas Patry 2023-08-23 15:37:38 +02:00
  • d0bb35d5a6 Merge pull request #1316 from boyleconnor/add-expect-for-no-truncation Arthur 2023-08-18 19:30:53 +02:00
  • 540bf2eb01 pyo3: update to 0.19 (#1322) Michael Lui 2023-08-16 12:40:32 -04:00
  • 9a93c50c25 Fix stride condition. (#1321) Nicolas Patry 2023-08-14 15:27:55 +02:00
  • b35d33f981 Release all at once for simplicity. (#1320) Nicolas Patry 2023-08-14 13:49:45 +02:00
  • fb292d1eae 0.13.4.rc1 (#1319) Nicolas Patry 2023-08-14 12:06:43 +02:00
  • 862046ac94 CD backports (#1318) Chris Ha 2023-08-11 01:52:22 +09:00
  • 748556a9ed Fix code style Connor Boyle 2023-08-07 15:17:43 -07:00
  • d47d3e377c Derive clone for TrainerWrapper (#1317) Jonatan Kłosko 2023-08-07 15:15:10 +02:00
  • a0a8ebe03f Add expect() for disabling truncation Connor Boyle 2023-08-06 13:25:50 -07:00
  • efea6c7246 Handle when precompiled charsmap is empty (#1308) Kelly Marchisio 2023-07-31 13:35:24 +01:00
  • c2664ae13f Give error when initializing tokenizer with too high stride (#1306) Connor Boyle 2023-07-28 00:16:44 -07:00
  • bb38f390a6 Single warning for holes. (#1303) Nicolas Patry 2023-07-25 11:57:23 +01:00
  • d6326b2b88 feat: Added CITATION.cff. (#1302) Samuel Larkin 2023-07-25 06:16:09 -04:00
  • ea4d3f634c Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node (#1299) dependabot[bot] 2023-07-21 08:08:10 +02:00
  • 291b2e23ae Fixing clippy warnings on 1.71. (#1296) Nicolas Patry 2023-07-16 15:58:38 +02:00
  • 4811f769a1 import Tuple from typing (#1295) Kelly Marchisio 2023-07-14 11:39:29 -04:00
  • 150559b61e master -> main (#1292) Arthit Suriyawongkul 2023-07-12 10:51:22 +01:00
  • 92bfb9c993 Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node (#1291) dependabot[bot] 2023-07-10 09:44:31 +02:00
  • 26659de473 revise type specification (#1289) Hiroshi Matsuda 2023-07-06 23:36:48 +09:00
  • 864135bef1 Add unigram bytefallback (#1217) Arthur 2023-06-26 17:46:59 +09:00
  • 8c9cfb0b68 Improve error for truncation with too high stride (#1275) Connor Boyle 2023-06-12 01:38:42 -07:00
  • 348ed70e58 [doc build] Use secrets (#1273) Mishig 2023-06-09 12:58:27 +02:00
  • 5d70f15bfb Update README.md - Broken link (#1272) Santosh Bhavani 2023-06-08 04:20:11 -04:00