Commit Graph

  • 80f6d58177 big big big Pierric Cistac 2020-01-10 14:49:13 -05:00
  • dd569020c1 Bump python version for release Anthony MOI 2020-01-10 13:49:26 -05:00
  • 613a25f7b3 Merge pull request #49 from huggingface/fix-pyi MOI Anthony 2020-01-10 13:41:09 -05:00
  • 89e0d90c8a Python - Final fix of the typings Anthony MOI 2020-01-10 13:30:29 -05:00
  • 56878a8e43 fix : Pierric Cistac 2020-01-09 09:50:42 -05:00
  • 958883af74 fix imports in root __init__.pyi Pierric Cistac 2020-01-08 18:18:38 -05:00
  • 4cfbb11d18 Update README MOI Anthony 2020-01-10 12:55:10 -05:00
  • ac2945143d Add doc badge MOI Anthony 2020-01-10 12:42:49 -05:00
  • aa0b644c11 Cargo.toml fix number of keywords Anthony MOI 2020-01-10 12:30:42 -05:00
  • 46e0ad5c9a Bump for crates.io Anthony MOI 2020-01-10 12:29:00 -05:00
  • b491c0b8c4 Update Python Readme MOI Anthony 2020-01-10 12:18:16 -05:00
  • d26db40da7 Improve Rust readme header MOI Anthony 2020-01-10 12:15:29 -05:00
  • 1fa11eba27 Quick example of rust usage Anthony MOI 2020-01-10 12:13:49 -05:00
  • 34875d5771 new batch of typings Pierric Cistac 2020-01-10 10:07:38 -05:00
  • 969d994f70 add more methods on tokenizer Pierric Cistac 2020-01-09 18:06:01 -05:00
  • 66d65595f6 clean package / package-lock Pierric Cistac 2020-01-09 17:41:24 -05:00
  • 6b0935d5de first implementations draft Pierric Cistac 2020-01-09 17:39:09 -05:00
  • 63532ef583 move native bindings typings into subdir and reexport from root index Pierric Cistac 2020-01-09 15:37:36 -05:00
  • d56f719cbb better structure Pierric Cistac 2020-01-09 13:44:21 -05:00
  • 0b8a51c010 First draft node typings Pierric Cistac 2020-01-09 13:36:49 -05:00
  • 0925c30997 Node - Improve handling of optionals Anthony MOI 2020-01-10 11:52:15 -05:00
  • 15739d5e7e Readme - remove bad link MOI Anthony 2020-01-10 11:38:15 -05:00
  • 3ff78e43aa Add rust to readme MOI Anthony 2020-01-10 11:32:25 -05:00
  • b4701773b5 Tweak readme MOI Anthony 2020-01-10 11:29:43 -05:00
  • 6295af6e6d Improve readme MOI Anthony 2020-01-10 11:29:11 -05:00
  • e7395285f2 Split readme Anthony MOI 2020-01-10 11:09:28 -05:00
  • b27737d97c Python - Typings update Anthony MOI 2020-01-10 10:06:24 -05:00
  • b357a3ed5a Merge pull request #48 from huggingface/fix-python-stuff MOI Anthony 2020-01-10 10:03:02 -05:00
  • 07e2548e01 Quick tweak of the training progress bar Anthony MOI 2020-01-10 10:00:54 -05:00
  • d8f3fba245 fix training and wordpiece thomwolf 2020-01-10 10:47:50 +01:00
  • 1a802cb484 fix typos thomwolf 2020-01-10 10:47:36 +01:00
  • d46ea842c2 Python - IndexableString accepts tuples directly Anthony MOI 2020-01-10 00:32:30 -05:00
  • 1f16fcbe77 Show progress while reading files during training Anthony MOI 2020-01-10 00:21:46 -05:00
  • 7e59ff8ee9 Node - Add missing getters and setters on Tokenizer Anthony MOI 2020-01-09 21:51:16 -05:00
  • fdb67e02ff Node - Tokenizer can be trained Anthony MOI 2020-01-09 21:01:57 -05:00
  • a2c16c71e9 Node - Add trainers Anthony MOI 2020-01-09 20:12:14 -05:00
  • ddbc0491bd Node - Add missing models Anthony MOI 2020-01-09 19:24:40 -05:00
  • b75577eecc Node - Add pre tokenizers Anthony MOI 2020-01-09 19:14:15 -05:00
  • 264cdb4266 Node - Add normalizers Anthony MOI 2020-01-09 18:37:10 -05:00
  • 796601adbc Node - Add addTokens and addSpecialTokens Anthony MOI 2020-01-09 17:41:42 -05:00
  • a1fd99125c Node - tokenToId & idToToken Anthony MOI 2020-01-09 17:23:02 -05:00
  • 63a3ffbf13 Node - Add decode & decodeBatch Anthony MOI 2020-01-09 17:15:01 -05:00
  • 6816628d1a Node - Hotfix EncodeTask Anthony MOI 2020-01-09 16:05:48 -05:00
  • cb52b71f63 Node - Fix tasks count Anthony MOI 2020-01-09 15:43:16 -05:00
  • 6561511214 Node - Pad & Truncate on Encoding Anthony MOI 2020-01-09 15:43:07 -05:00
  • 778b611fb5 Node - Add some Encoding features Anthony MOI 2020-01-09 14:50:39 -05:00
  • 274fcd3bfe Node - Lift running tasks restriction Anthony MOI 2020-01-09 14:18:43 -05:00
  • de35bb4f45 Node - Fix EncodeTask Anthony MOI 2020-01-09 14:04:20 -05:00
  • bda03ffe8c Node - Make encode & encodeBatch async Anthony MOI 2020-01-09 13:49:59 -05:00
  • 19d41a5810 Node - Add encode & encodeBatch Anthony MOI 2020-01-09 11:47:35 -05:00
  • 1a54692190 Node - Add PostProcessors Anthony MOI 2020-01-09 01:41:41 -05:00
  • 83f21ab33d Node - Fix typo Anthony MOI 2020-01-09 01:08:54 -05:00
  • afb6b48361 Node - Expose decoders Anthony MOI 2020-01-09 01:05:36 -05:00
  • 3685eb1809 Node - Improve namings Anthony MOI 2020-01-09 01:02:28 -05:00
  • 156d86d91e Node - Basic Tokenizer + BPE Anthony MOI 2020-01-09 00:04:53 -05:00
  • 13f3fbed30 Merge pull request #47 from huggingface/sentencepiece_export MOI Anthony 2020-01-08 17:18:20 -05:00
  • be10f542ce Added SentencePiece and YouTokenToMe model extractors. Morgan Funtowicz 2020-01-08 22:55:00 +01:00
  • f86b8d412b Fix NormalizedString split_off bis2 Anthony MOI 2020-01-08 16:51:59 -05:00
  • 7ac45472b6 Fix NormalizedString split_off bis Anthony MOI 2020-01-08 16:48:47 -05:00
  • 3b2e19f52c Fix NormalizedString split_off Anthony MOI 2020-01-08 16:43:07 -05:00
  • 313d674dc0 Merge pull request #45 from huggingface/bpe_save_compat_tweak MOI Anthony 2020-01-08 16:23:15 -05:00
  • 3af2a43cae Hotfix Python bindings Anthony MOI 2020-01-08 16:20:05 -05:00
  • ef21c9a7b0 Hotfix for new Builder Anthony MOI 2020-01-08 16:19:51 -05:00
  • 6697d65544 Fixup + better compat Julien Chaumond 2020-01-08 20:52:28 +00:00
  • d2d5b1eae7 tweak builder pattern to find defaults in Builder::Default (#42) Evan Pete Walsh 2020-01-08 12:10:34 -08:00
  • c7d2800131 Python - Add model saving to base tokenizer Anthony MOI 2020-01-08 14:44:17 -05:00
  • cbe13e0aee BPE save compatibility tweak Julien Chaumond 2020-01-08 19:41:11 +00:00
  • bbe31f9237 Quick README update Anthony MOI 2020-01-08 14:07:48 -05:00
  • 988159a998 Hotfix Python bindings for 32-bit systems Anthony MOI 2020-01-08 13:42:26 -05:00
  • 4e7fc93971 improve Makefile epwalsh 2020-01-08 10:06:42 -08:00
  • 383123e21f Bump version Anthony MOI 2020-01-08 11:02:40 -05:00
  • dbc9c1b6ea Merge pull request #43 from huggingface/python-tokenizer MOI Anthony 2020-01-08 11:01:19 -05:00
  • bc48a89770 Python - Handle training on custom classes Anthony MOI 2020-01-08 10:33:59 -05:00
  • fc56f8d186 Python - Update some naming Anthony MOI 2020-01-08 09:54:03 -05:00
  • 45c6382c35 Merge pull request #44 from huggingface/fixes MOI Anthony 2020-01-08 09:45:14 -05:00
  • 882df9b8e2 better repr for tokenizers thomwolf 2020-01-08 12:06:46 +01:00
  • 111c2d152c add option to remove special tokens thomwolf 2020-01-08 11:48:47 +01:00
  • af6a685664 fix add_special_tokens thomwolf 2020-01-08 11:48:37 +01:00
  • b16ee75b97 Add BertWordPieceTokenizer Anthony MOI 2020-01-08 00:32:13 -05:00
  • 88711d5717 Python - IndexableString in Encoding Anthony MOI 2020-01-08 00:06:57 -05:00
  • fb250fd7fc Fix NormalizedString indexing Anthony MOI 2020-01-07 23:43:55 -05:00
  • dc76e11768 Python - Provide __repr__ for Encoding Anthony MOI 2020-01-07 21:33:45 -05:00
  • 05f683ce23 Add SentencePieceBPETokenizer Anthony MOI 2020-01-07 20:30:15 -05:00
  • ee115df65e Add the original BPETokenizer Anthony MOI 2020-01-07 19:58:48 -05:00
  • 243a45af40 Add BPEDecoder Anthony MOI 2020-01-07 19:56:49 -05:00
  • 5bc1e2ee05 Add Lowercase Normalizer Anthony MOI 2020-01-07 19:40:19 -05:00
  • 099bb8e596 Python - Dropout and unk_token optional Anthony MOI 2020-01-07 19:34:36 -05:00
  • 03c431c60e Modify BPE with unk_token being a String Anthony MOI 2020-01-07 19:22:29 -05:00
  • b17f9d8872 Rename ByteLevelBPE Anthony MOI 2020-01-07 18:19:10 -05:00
  • 6d0e3ba8f1 fix imports thomwolf 2020-01-07 22:55:50 +01:00
  • 63063118df Python - Adding tokenizers classes - WIP Anthony MOI 2020-01-07 16:20:20 -05:00
  • 6294d342d5 Hotfix metaspace decoder Anthony MOI 2020-01-07 18:53:07 -05:00
  • cbdd2cf423 Python - add Metaspace decoder Anthony MOI 2020-01-07 18:40:18 -05:00
  • 43acdcfacf Oops, fix BpeBuilder::build() epwalsh 2020-01-07 13:46:04 -08:00
  • 4e026b57a8 Python - quick fix stub file Anthony MOI 2020-01-07 16:16:00 -05:00
  • cd8f057fa8 update models API and documentation (#41) Evan Pete Walsh 2020-01-07 13:13:40 -08:00
  • 3f806a2b5f Python - Also update README Anthony MOI 2020-01-07 15:24:39 -05:00
  • cc33418044 Python - Update examples with getter/setter Anthony MOI 2020-01-07 15:23:11 -05:00
  • 8bbf832842 Python - Use Getter/Setter to get/modify Tokenizer's parts Anthony MOI 2020-01-07 15:17:23 -05:00
  • eaa23ac8e6 Add the Metaspace PreTokenizer Anthony MOI 2020-01-07 12:59:59 -05:00