Commit Graph

  • f85e8467e4 Update Cargo.toml (#1266) Chris Ha 2023-06-07 16:57:18 +09:00
  • cb8d4de599 fix documentation regarding regex (#1264) Chris Ha 2023-06-07 16:41:28 +09:00
  • c7102c4c0f Fixing broken link. (#1268) Nicolas Patry 2023-06-06 11:10:28 +02:00
  • cb819724ef Update trainer.rs (#1257) Chris Ha 2023-05-25 19:24:29 +09:00
  • fc76ad4f07 Parallelize unigram trainer (#976) Mishig 2023-05-22 15:36:03 +02:00
  • a03330607b Update all GH Actions with dependency on actions/checkout from v[1,2] to v3 to notably improve performance (retrieve only the commit being checked-out) (#1256) Funtowicz Morgan 2023-05-22 14:50:00 +02:00
  • b4fcc9ce6e Makes decode and decode_batch work on borrowed content. (#1251) Funtowicz Morgan 2023-05-17 11:18:15 +02:00
  • cefc41e8ec implement a simple max_sentencepiece_length into BPE (#1228) Chris Ha 2023-05-16 17:08:19 +09:00
  • daf3fcc976 Rvert main hiccup. Nicolas Patry 2023-05-15 18:01:29 +02:00
  • b58227c7f1 Never gonna make you cry Nicolas Patry 2023-05-12 16:28:57 +02:00
  • 02ad59edc1 Never gonna run around and desert you Nicolas Patry 2023-05-12 16:27:06 +02:00
  • 8d07696c38 Never gonna let you down Nicolas Patry 2023-05-12 16:24:26 +02:00
  • 4518b0f7f2 fix unigram.rs test_sample() (#1244) Chris Ha 2023-05-11 00:04:34 +09:00
  • 87230bb59b use LTO for release and benchmark builds (#1157) Kornél Csernai 2023-05-09 07:15:57 -07:00
  • 15085ef905 Fixing padding_left sequence_ids. (#1233) Nicolas Patry 2023-05-04 15:57:20 +02:00
  • ef5f50605d Printing warning to stderr. (#1222) Nicolas Patry 2023-04-19 14:55:24 +02:00
  • d19bc63c67 Merge pull request #1212 from huggingface/fix-node-release Arthur 2023-04-06 16:25:29 +02:00
  • a714aac6f6 revert changes arthur.zucker@gmail.com 2023-04-06 14:07:46 +00:00
  • ceb73dbd29 publish npm arthur.zucker@gmail.com 2023-04-06 13:35:29 +00:00
  • 42b110587c Fix conda release (#1211) Arthur 2023-04-06 12:30:14 +02:00
  • fbd8d6188e update for testing arthur.zucker@gmail.com 2023-04-06 10:29:42 +00:00
  • 37372b67fa Merge pull request #1207 from huggingface/v0.13.3 Arthur 2023-04-05 09:58:19 +02:00
  • ce244bd094 remove rc1 Arthur 2023-04-04 16:19:42 +02:00
  • a05be6b8d1 Merge pull request #1205 from huggingface/new_version Arthur 2023-04-04 15:03:38 +02:00
  • 1cb44bd180 New version 0.13.3 Nicolas Patry 2023-04-04 14:14:17 +02:00
  • 3aaf4946b3 Add content to Strip decoder to allow decoding mid tokens. (#1199) Nicolas Patry 2023-03-24 10:14:49 +01:00
  • 8a6a8dc9d5 Fixing decoder strip because of char boundaries. (#1197) Nicolas Patry 2023-03-24 01:57:39 +01:00
  • e4aea890d5 Adding 2 new decoders: (#1196) Nicolas Patry 2023-03-24 00:50:54 +01:00
  • d2c8190a0f Creating normalizers.Prepend (To be used instead of Metaspace). (#1194) Nicolas Patry 2023-03-24 00:33:31 +01:00
  • 250d46c676 Adding Replace to decoder (to undo the Replace Normalizer for (#1195) Nicolas Patry 2023-03-23 23:43:47 +01:00
  • 178e294a6a Merge pull request #1192 from huggingface/faster-datasets-train-example Quentin Lhoest 2023-03-23 17:19:05 +01:00
  • 73637a0004 Adding ByteFallback support for tokenizers. (#1183) Nicolas Patry 2023-03-23 16:04:32 +01:00
  • e76f900bc0 Faster datasets train example Quentin Lhoest 2023-03-23 11:24:30 +01:00
  • b8fbea00a9 Bump dirs from 3.0 to 4.0 (#1142) Roy Hvaara 2023-03-21 02:32:02 -07:00
  • 5ecd329503 Fixing infinite loop in UnigramTrainer. (#1182) Nicolas Patry 2023-03-15 14:59:01 +01:00
  • 9c0e700212 Bump webpack in /tokenizers/examples/unstable_wasm/www (#1181) dependabot[bot] 2023-03-15 10:54:26 +01:00
  • 5c18ec5ff5 pyo3 v0.18 migration (#1173) mert-kurttutan 2023-03-08 11:27:47 +01:00
  • 3138657565 Using clippy 1.67 (#1167) Nicolas Patry 2023-03-02 12:28:39 +01:00
  • ac552ff8b9 Update model.rs (#1166) Thomas Wang 2023-02-28 17:35:57 +01:00
  • fa66caf0ab Improved version. (#1154) Nicolas Patry 2023-01-23 16:35:19 +01:00
  • d09241fba1 Prevent using from_pretrained on invalid ids (better error message). (#1153) Nicolas Patry 2023-01-23 15:38:14 +01:00
  • b861d48b06 Making Tokenizer clone. (#1152) Nicolas Patry 2023-01-23 10:12:35 +01:00
  • 1fcd90b0b7 Update info on environment variable for threading (#1150) mert-kurttutan 2023-01-22 21:24:41 +01:00
  • 33a57e6418 Made dirs optional (#1148) Andrew Kane 2023-01-18 00:29:15 -08:00
  • daf8aebd76 Adding python 3.8 for M1 (#1147) Nicolas Patry 2023-01-16 16:40:46 +01:00
  • 5a94a2b6e7 Add missing build targets (#1145) Nicolas Patry 2023-01-15 10:18:08 +01:00
  • fe4ae7dc38 Bump json5 from 2.2.0 to 2.2.3 in /bindings/node (#1140) dependabot[bot] 2023-01-03 11:50:51 +01:00
  • c3fedd96b3 Bump json5, copy-webpack-plugin, webpack and webpack-cli (#1139) dependabot[bot] 2023-01-03 10:22:49 +01:00
  • 9b155b5723 [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. (#1136) SeongBeomLEE 2022-12-27 19:13:52 +09:00
  • 60a00dda44 Fix one char super tiny typo (#1137) fzyzcjy 2022-12-26 18:13:38 +08:00
  • 4d520c9664 Ignore Cargo.lock for subfolders (#1131) Roy Hvaara 2022-12-25 11:35:47 +01:00
  • fbad581128 Bump derive_builder from 0.9 to 0.12 (#1129) Roy Hvaara 2022-12-23 23:37:16 +01:00
  • 2bed678958 Fix broken links in docs (#1133) Roy Hvaara 2022-12-23 23:35:18 +01:00
  • 3e7476de86 Wrap rustdoc html entity in code block (#1130) Roy Hvaara 2022-12-23 23:30:45 +01:00
  • 03ce27d2fa Bump cached-path from 0.5 to 0.6 (#1127) Roy Hvaara 2022-12-21 18:10:48 +01:00
  • 5886179eee Bump decode-uri-component in /tokenizers/examples/unstable_wasm/www (#1125) dependabot[bot] 2022-12-19 14:24:24 +01:00
  • a408b44429 Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node (#1126) dependabot[bot] 2022-12-19 14:09:24 +01:00
  • bfa842e063 Adding stale bot ? (#1123) Nicolas Patry 2022-12-19 13:50:48 +01:00
  • 1649d74536 Fixing conda ssl location (#1124) Nicolas Patry 2022-12-19 13:50:36 +01:00
  • 9a25b2cb8e [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. (#1120) SeongBeomLEE 2022-12-19 21:40:04 +09:00
  • 102dfe87a3 Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node (#1116) dependabot[bot] 2022-12-05 18:09:38 +01:00
  • 67080e163a Include license file in Rust crate (#1115) Andrew Kane 2022-11-30 14:17:56 -08:00
  • c74e9e62f6 Bump loader-utils in /tokenizers/examples/unstable_wasm/www (#1108) dependabot[bot] 2022-11-16 12:01:25 +01:00
  • e9529cb02f Merge pull request #1107 from huggingface/revert-1101-update_doc_pr_actions Mishig 2022-11-16 11:41:51 +01:00
  • ffcf5a4136 Revert "Update pr docs actions (#1101)" Mishig 2022-11-16 11:41:38 +01:00
  • bbae829a72 Adding rust audit. (#1099) Nicolas Patry 2022-11-09 12:59:36 +01:00
  • 99c06c82e0 Update pr docs actions (#1101) Mishig 2022-11-09 11:09:52 +01:00
  • b8a4aa6000 Fixing extra wheels memory usage. (#1098) Nicolas Patry 2022-11-07 09:11:18 +01:00
  • 11bb2e00f2 Add python 3.11 to manylinux buildwheels (#1096) Cameron 2022-11-07 17:45:04 +10:00
  • 96a9e5715c New version. (#1082) Nicolas Patry 2022-10-06 15:45:56 +02:00
  • 4ef0afbeb6 Update old gh actions, remove deprecated doc building. (#1069) Nicolas Patry 2022-10-05 17:59:46 +02:00
  • 8129dd3309 pyo3: update to 0.17 (#1066) David Hewitt 2022-10-05 15:59:01 +01:00
  • 6113666624 Updating python formatting. (#1079) Nicolas Patry 2022-10-05 15:29:33 +02:00
  • 5f6e978452 Fixing roberta type id (everything is zero). (#1072) Nicolas Patry 2022-09-26 18:00:41 +02:00
  • 6e5569a540 Moving versions numbers to dev mode. (#1067) Nicolas Patry 2022-09-22 18:24:07 +02:00
  • 63082c4d11 Enabling static interpreter embedding for manylinux. (#1064) Nicolas Patry 2022-09-21 12:18:46 +02:00
  • 655f4057b7 Removing python3.6 from manylinux it's not supported anymore. (#1063) Nicolas Patry 2022-09-19 12:22:02 +02:00
  • 7c146d9ce5 Turns out we introduced a regression because bad code. (#1060) Nicolas Patry 2022-09-16 11:20:59 +02:00
  • 7bfab48979 Preparing rc1 release. (#1056) Nicolas Patry 2022-09-12 16:07:06 +02:00
  • 06025e4ca1 Adding Sequence for PostProcessor. (#1052) Nicolas Patry 2022-08-25 14:50:06 +02:00
  • 37f7bae0f7 Making process_encodings not eat up the encodings any more. (#1051) Nicolas Patry 2022-08-25 11:49:18 +02:00
  • c174b5bd34 Adding m1 build to the release process for Python. (#1055) Nicolas Patry 2022-08-25 11:06:03 +02:00
  • 6878ab028d Bump node-forge and webpack-dev-server (#1053) dependabot[bot] 2022-08-24 20:08:46 +02:00
  • 460bdded80 Modify Processor trait to support chaining. (#1054) Nicolas Patry 2022-08-24 19:49:23 +02:00
  • b1c9bc68b5 Updating code according to clippy. (#1048) Nicolas Patry 2022-08-24 19:45:15 +02:00
  • 67c56adf68 Upgrade macro_rules_attribute to 0.1.2 (#1038) pacowong 2022-08-08 20:03:19 +08:00
  • 67fb60a33c Bump terser in /tokenizers/examples/unstable_wasm/www (#1032) dependabot[bot] 2022-07-22 09:00:14 +02:00
  • eb2213842b Update README.md (#1019) Arthur 2022-07-19 09:54:29 +02:00
  • 3564f24311 Add from_bytes approach for creating tokenizers (#1024) HaoboGu 2022-07-18 22:25:45 +08:00
  • adf90dcd72 Adding unstable_wasm feature + example to run tokenizers on wasm. (#1009) Nicolas Patry 2022-06-10 14:58:02 +02:00
  • 943b5421aa Changing Decoder trait to be more composable. (#938) (#1008) Nicolas Patry 2022-06-02 14:43:42 +02:00
  • 519cc13be0 Upgrade pyo3 to 0.16 (#956) h-vetinari 2022-05-06 00:48:40 +11:00
  • 6533bf0fad Merge pull request #989 from huggingface/mishig25-patch-2 Mishig Davaadorj 2022-04-25 21:03:52 +02:00
  • 00132ba836 Update pipeline.mdx Mishig Davaadorj 2022-04-25 21:03:31 +02:00
  • 0bd4976dba Merge pull request #988 from huggingface/mishig25-patch-1 Mishig Davaadorj 2022-04-25 17:54:10 +02:00
  • 6a84727368 Update pipeline.mdx Mishig Davaadorj 2022-04-25 17:50:12 +02:00
  • e6cd73a291 .dev0 suffix in python version (#987) Mishig Davaadorj 2022-04-22 09:36:18 +02:00
  • e7d9e34f9e Merge pull request #986 from huggingface/doc_build_typo Mishig Davaadorj 2022-04-21 16:42:49 +02:00
  • 37957f67f1 Fix typo in doc-build GH workflow Mishig Davaadorj 2022-04-21 16:42:04 +02:00
  • 142d7ba381 Merge pull request #980 from huggingface/docs_new_frontend Mishig Davaadorj 2022-04-21 16:35:42 +02:00