Commit Graph

  • da4c7b10e4 Add a way to specify the unknown token in SentencePieceUnigramTokenizer python implem (#762) SaulLu 2021-08-12 15:42:44 +02:00
  • 46bed542fa Bump path-parse from 1.0.6 to 1.0.7 in /bindings/node (#774) dependabot[bot] 2021-08-12 09:41:25 -04:00
  • ab3d3bcbfb Bump tar from 4.4.13 to 4.4.17 in /bindings/node (#775) dependabot[bot] 2021-08-12 09:31:47 -04:00
  • 5d1b0a9381 Bump glob-parent from 5.1.1 to 5.1.2 in /bindings/node (#734) dependabot[bot] 2021-08-12 09:21:00 -04:00
  • 96c122ccf6 Bump ws from 7.3.1 to 7.4.6 in /bindings/node (#721) dependabot[bot] 2021-08-12 09:20:36 -04:00
  • 256a71c1f2 Clippy 1.54. (#773) Nicolas Patry 2021-08-11 14:43:49 +02:00
  • d83772d62c Fixing tokenizers with 1.53 (updated some dependencies + clippy) (#764) Nicolas Patry 2021-07-21 09:58:38 +02:00
  • 755e5f5c1e Remove support for Python 3.5 (#714) Anthony MOI 2021-05-24 17:31:01 -04:00
  • 3a002c1aa8 Python - prepare for release 0.10.3 Anthony MOI 2021-05-24 16:56:27 -04:00
  • c046da7679 Fix stripping strings containing Unicode characters (#707) Nicolas Patry 2021-05-24 22:49:59 +02:00
  • 4b7f8c2d7c Fix CHANGELOG.md Anthony MOI 2021-05-24 16:16:40 -04:00
  • bd19584580 Bump lodash from 4.17.19 to 4.17.21 in /bindings/node (#701) dependabot[bot] 2021-05-20 14:22:02 -04:00
  • 8f639b42ea Bump hosted-git-info from 2.8.8 to 2.8.9 in /bindings/node (#702) dependabot[bot] 2021-05-20 14:21:52 -04:00
  • 7574349223 Bump y18n from 4.0.0 to 4.0.3 in /bindings/node (#708) dependabot[bot] 2021-05-20 14:21:40 -04:00
  • 3cf957e6f8 Bump handlebars from 4.7.6 to 4.7.7 in /bindings/node (#700) dependabot[bot] 2021-05-20 14:21:28 -04:00
  • 4b0dc6b947 Fix SPM conversions (#686) Lysandre Debut 2021-05-20 15:55:55 +02:00
  • 2e2e7558f7 Add CTC Decoder for Wave2Vec models (#693) Nicolas Patry 2021-05-20 15:30:09 +02:00
  • e999a7b5f9 Revert "Fix SPM conversions" Lysandre 2021-04-21 18:09:36 -04:00
  • e1ffe39764 Fix SPM conversions Lysandre 2021-04-21 18:09:36 -04:00
  • 32b3b7a0f2 Python - Prepare for release 0.10.2 Anthony MOI 2021-04-05 16:32:03 -04:00
  • c3b3b29039 Rust - Add another test for Metaspace deserialization Anthony MOI 2021-04-05 11:43:11 -04:00
  • e1627654b4 Fix Clippy warnings for Rust 1.51 Anthony MOI 2021-04-02 17:18:19 -04:00
  • 659a835d04 Python - Accept kwargs in Metaspace constructor Anthony MOI 2021-04-02 16:51:03 -04:00
  • a891e29c02 Rust - Remove str_rep from Metaspace serialization Anthony MOI 2021-04-02 16:41:39 -04:00
  • 0fe9214f44 Fix BPE continuing_subword_prefix Anthony MOI 2021-03-16 16:50:42 -04:00
  • f5e9bb89b7 Fix offsets for Precompiled corner case Anthony MOI 2021-03-10 20:00:21 -05:00
  • f12be3030f Try with ubuntu 18.04 Anthony MOI 2021-03-16 12:17:44 -04:00
  • 53ab5a470c Allow unnecessary_wraps for node bindings Anthony MOI 2021-03-10 21:20:21 -05:00
  • 56a9196030 Fix clippy warnings Anthony MOI 2021-03-10 20:26:39 -05:00
  • ee95e7f0cd Actually fix the link to pepy.tech on downloads badge Anthony MOI 2021-02-09 21:05:52 -05:00
  • 1321dcf143 Hotfix link to pepy.tech on downloads badge Anthony MOI 2021-02-09 21:04:56 -05:00
  • bc8bbf637a Prepare for python v0.10.1 (#625) Anthony MOI 2021-02-08 11:45:56 -05:00
  • d96442cbe8 Python - Prepare for release 0.10.1rc1 (#622) Anthony MOI 2021-02-04 10:37:00 -05:00
  • 57200144ca Python - Fix ByteLevel instantiation from state (#621) Anthony MOI 2021-02-04 10:16:05 -05:00
  • 324cb8d380 CI - Fix conda build Anthony MOI 2021-02-04 10:12:30 -05:00
  • a8f756494e Improve Model serialization/deserialization (#620) Anthony MOI 2021-02-04 09:59:18 -05:00
  • ce9325b714 Update README.md Anthony MOI 2021-02-03 15:54:01 -05:00
  • 6a29dbc070 Doc - Hotfix training from iterators tutorial Anthony MOI 2021-02-03 15:50:09 -05:00
  • db22cb6315 Python - Fix Normalizer.normalize with PyNormalizedStringRefMut Anthony MOI 2021-02-03 11:20:57 -05:00
  • 355315e8d3 Rust - Fix offsets produced by Precompiled Normalizer Anthony MOI 2021-02-03 11:17:23 -05:00
  • 2c711d45ce CI - Force pyarrow<3.0.0 for now Anthony MOI 2021-02-03 10:55:47 -05:00
  • a350ec3e72 Rust - Fix a bug in the Metaspace PreTokenizer Anthony MOI 2021-02-03 10:02:07 -05:00
  • 96b9972842 Fix SentencePiece tokenizers conversion Anthony MOI 2021-02-03 09:57:41 -05:00
  • fc0a50a272 Update doc for Python 0.10.0 Anthony MOI 2021-01-12 16:35:13 -05:00
  • 719bea76b9 Python - Prepare for release 0.10.0 Anthony MOI 2021-01-12 16:18:53 -05:00
  • b9c6bea75e Add fuse_unk option to SentencePieceBPETokenizer (#574) devfon 2021-01-13 06:07:59 +09:00
  • 91dae1de15 Doc - Add documentation for training from iterators Anthony MOI 2021-01-12 15:30:01 -05:00
  • 7bee825238 Cleans up a few pattern-matches into their Option/Result equivalent François Garillot 2021-01-12 07:31:17 -08:00
  • cca5d43038 Python - Fix breaking change in Model.save Anthony MOI 2021-01-11 14:11:08 -05:00
  • 49d11b1f69 Python - Add components getter/setters to BaseTokenizer Anthony MOI 2021-01-11 14:33:13 -05:00
  • 65b91966f7 Fix import Formatter with new serde Anthony MOI 2021-01-11 15:43:50 -05:00
  • 1990f51b9f Simplify Whitespace pre_tokenizer Anthony MOI 2021-01-11 15:29:46 -05:00
  • d94fa220b6 Python - Add train_from_iterator to implementations Anthony MOI 2021-01-06 17:07:56 -05:00
  • 817c5ad317 Fix clippy warnings for rust 1.49 Anthony MOI 2021-01-06 11:58:22 -05:00
  • 5938a12b3f Python - Improve training with iterators Anthony MOI 2020-12-15 10:50:01 -05:00
  • dad8d6249e rm extraneous </a> (#573) Julien Chaumond 2021-01-06 17:37:37 +01:00
  • ae6534f12d Bump ini from 1.3.5 to 1.3.8 in /bindings/node (#561) dependabot[bot] 2020-12-15 11:50:40 -05:00
  • 6201258a0e CI - Python release extra should not provide the source distribution Anthony MOI 2020-12-08 13:31:57 -05:00
  • 0c6cc39eee Python - Update CHANGELOG and bump for release Anthony MOI 2020-12-04 12:30:53 -05:00
  • a3a9561f9f Rust - Fix WordLevelTrainer default values Anthony MOI 2020-12-08 13:16:03 -05:00
  • d71e66e53c CI - Fix docs deployment Anthony MOI 2020-12-04 10:59:48 -05:00
  • 8916b6bb27 Add a visualization utility to render tokens and annotations in a notebook (#508) Tal Perry 2020-12-04 16:25:56 +01:00
  • 5549fc4837 Python - Update CHANGELOG Anthony MOI 2020-11-28 12:42:37 -05:00
  • 49bd055519 Node - Update bindings with train_from_files Anthony MOI 2020-11-27 16:45:13 -05:00
  • 3a8627ce4d Improve docs and fix tests around training Anthony MOI 2020-11-27 16:44:17 -05:00
  • 06f6ba3fce Use train_from_files in benchmarks Anthony MOI 2020-11-25 17:25:49 -05:00
  • 999067454d Make sure we first try to extract a string Anthony MOI 2020-11-25 16:43:52 -05:00
  • ed9baeabb7 Add example for training with datasets Anthony MOI 2020-11-25 15:58:18 -05:00
  • c36ac0bfdf Improve progress tracking while training Anthony MOI 2020-11-25 15:55:58 -05:00
  • 75deaecdd0 Also accept iterators of batches in train_from_iterator Anthony MOI 2020-11-24 22:48:55 -05:00
  • e0a70f1fb2 Add ability to train from Iterator Anthony MOI 2020-11-12 12:58:14 -05:00
  • 6e364cb685 Python - Update CHANGELOG and stub files Anthony MOI 2020-11-27 17:25:43 -05:00
  • a351d1c604 Python - Trainers can get/set their attributes Anthony MOI 2020-11-24 17:46:58 -05:00
  • 3eb7ef6d0a Python - PreTokenizers can get/set their attributes Anthony MOI 2020-11-24 13:55:59 -05:00
  • 5c35fafc44 Python - Decoders can get/set their attributes Anthony MOI 2020-11-23 22:41:27 -05:00
  • 091287dcf5 Python - Use macro for getter/setter in models Anthony MOI 2020-11-20 20:51:17 -05:00
  • 2feccdbbfa Python - PyStrip can get/set its attributes Anthony MOI 2020-11-16 17:43:45 -05:00
  • 7512d5e4ce Python - PyBertNormalizer can get/set its attributes Anthony MOI 2020-11-16 17:37:08 -05:00
  • 78beae8b7d Python - PyWordLevel can get/set its attributes Anthony MOI 2020-11-16 14:31:49 -05:00
  • 760537aad3 Python - PyWordPiece can get/set its attributes Anthony MOI 2020-11-16 12:34:55 -05:00
  • c22cfc31f9 Python - PyNormalizer & PyPreTokenizer use a RwLock Anthony MOI 2020-11-16 11:37:28 -05:00
  • 76d3b2128b Python - PyBPE can get/set its attributes Anthony MOI 2020-11-13 17:35:18 -05:00
  • 7f3cfebf45 Python - PyModel uses a RwLock to allow modifications Anthony MOI 2020-11-13 16:17:20 -05:00
  • dd399d2ad0 Split Pre-Tokenizer (#542) Patrick von Platen 2020-11-27 23:07:03 +01:00
  • 58e1d8de67 Python - Improve documentation for trainers Anthony MOI 2020-11-20 17:54:53 -05:00
  • 64441b54b1 Python - Improve documentation for post-processors Anthony MOI 2020-11-20 17:48:11 -05:00
  • 933a2a9c99 Python - Improve pre-tokenizers docs Anthony MOI 2020-11-20 17:17:46 -05:00
  • 5842b3db73 Python - Improve normalizers docs Anthony MOI 2020-11-20 16:26:50 -05:00
  • c01c301743 Python - Improve documentation for decoders and remove useless kwargs Anthony MOI 2020-11-20 15:42:10 -05:00
  • a50d4b7d25 Python - Improve documentation for models Anthony MOI 2020-11-20 14:47:58 -05:00
  • dc60d4fc0c Fix BaseTokenizer enable_truncation docstring Nick 2020-11-21 10:47:12 -05:00
  • 2fbd6779f6 Make sure TrainerWrapper can only train the right Model Anthony MOI 2020-11-20 09:02:28 -05:00
  • 13e07da2c8 Node - Add WordLevelTrainer Anthony MOI 2020-11-19 20:01:28 -05:00
  • 7fc37a03e8 Node - Trainers train the Model in-place Anthony MOI 2020-11-19 19:57:50 -05:00
  • 387b8a1033 Generate pyi, fix tests and clippy warnings Anthony MOI 2020-11-19 17:57:58 -05:00
  • 5059be1a8d Test BPE keeping its options after training Anthony MOI 2020-11-10 11:01:56 -05:00
  • 284a1dbee7 PyModel uses a RwLock to allow modifications Anthony MOI 2020-10-08 19:33:30 -04:00
  • 54c7210b2f Train Model in place Anthony MOI 2020-10-08 18:20:38 -04:00
  • 224862fe0c Python - Make the trainer optional on Tokenizer.train Anthony MOI 2020-10-07 21:25:32 -04:00
  • c230183cf6 A Model can return its associated Trainer Anthony MOI 2020-10-07 17:44:58 -04:00