|
1a802cb484
|
fix typos
|
2020-01-10 10:47:36 +01:00 |
|
|
d46ea842c2
|
Python - IndexableString accepts tuples directly
|
2020-01-10 00:32:30 -05:00 |
|
|
be10f542ce
|
Added SentencePiece and YouTokenToMe model extractors.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
|
2020-01-08 22:55:00 +01:00 |
|
|
3af2a43cae
|
Hotfix Python bindings
|
2020-01-08 16:20:05 -05:00 |
|
|
ef21c9a7b0
|
Hotfix for new Builder
cc @epwalsh
|
2020-01-08 16:19:51 -05:00 |
|
|
c7d2800131
|
Python - Add model saving to base tokenizer
|
2020-01-08 14:44:17 -05:00 |
|
|
bbe31f9237
|
Quick README update
|
2020-01-08 14:07:48 -05:00 |
|
|
988159a998
|
Hotfix Python bindings for 32-bit systems
|
2020-01-08 13:42:35 -05:00 |
|
|
383123e21f
|
Bump version
|
2020-01-08 11:02:40 -05:00 |
|
|
bc48a89770
|
Python - Handle training on custom classes
|
2020-01-08 10:33:59 -05:00 |
|
|
fc56f8d186
|
Python - Update some naming
|
2020-01-08 09:54:03 -05:00 |
|
|
882df9b8e2
|
better repr for tokenizers
|
2020-01-08 12:06:46 +01:00 |
|
|
111c2d152c
|
add option to remove special tokens
|
2020-01-08 11:48:47 +01:00 |
|
|
af6a685664
|
fix add_special_tokens
|
2020-01-08 11:48:37 +01:00 |
|
|
b16ee75b97
|
Add BertWordPieceTokenizer
|
2020-01-08 00:32:13 -05:00 |
|
|
88711d5717
|
Python - IndexableString in Encoding
|
2020-01-08 00:06:57 -05:00 |
|
|
dc76e11768
|
Python - Provide __repr__ for Encoding
|
2020-01-07 21:33:45 -05:00 |
|
|
05f683ce23
|
Add SentencePieceBPETokenizer
|
2020-01-07 20:30:15 -05:00 |
|
|
ee115df65e
|
Add the original BPETokenizer
|
2020-01-07 19:58:48 -05:00 |
|
|
243a45af40
|
Add BPEDecoder
|
2020-01-07 19:56:49 -05:00 |
|
|
5bc1e2ee05
|
Add Lowercase Normalizer
|
2020-01-07 19:40:19 -05:00 |
|
|
099bb8e596
|
Python - Dropout and unk_token optional
|
2020-01-07 19:34:36 -05:00 |
|
|
03c431c60e
|
Modify BPE with unk_token being a String
|
2020-01-07 19:22:29 -05:00 |
|
|
b17f9d8872
|
Rename ByteLevelBPE
Rename ByteLevelBPETokenizer
|
2020-01-07 18:54:21 -05:00 |
|
|
6d0e3ba8f1
|
fix imports
|
2020-01-07 18:54:21 -05:00 |
|
|
63063118df
|
Python - Adding tokenizers classes - WIP
|
2020-01-07 18:54:21 -05:00 |
|
|
6294d342d5
|
Hotfix metaspace decoder
|
2020-01-07 18:53:07 -05:00 |
|
|
cbdd2cf423
|
Python - add Metaspace decoder
|
2020-01-07 18:40:18 -05:00 |
|
|
4e026b57a8
|
Python - quick fix stub file
|
2020-01-07 16:18:28 -05:00 |
|
|
3f806a2b5f
|
Python - Also update README
|
2020-01-07 15:24:39 -05:00 |
|
|
cc33418044
|
Python - Update examples with getter/setter
|
2020-01-07 15:23:11 -05:00 |
|
|
8bbf832842
|
Python - Use Getter/Setter to get/modify Tokenizer's parts
|
2020-01-07 15:17:23 -05:00 |
|
|
eaa23ac8e6
|
Add the Metaspace PreTokenizer
|
2020-01-07 12:59:59 -05:00 |
|
|
b06681cb1e
|
Bump version for release
|
2020-01-06 21:05:01 -05:00 |
|
|
185b6f0b8b
|
Add Sequence Normalizer
|
2020-01-06 21:03:05 -05:00 |
|
|
5c02bbbc4c
|
Add basic unicode normalizers
|
2020-01-06 20:38:42 -05:00 |
|
|
4b9ae66419
|
WordPiece decoder with customizable prefix
|
2020-01-06 20:20:42 -05:00 |
|
|
772d0680b6
|
Python - Update all typings
|
2020-01-06 20:03:00 -05:00 |
|
|
0079a7a6b7
|
Python - Add NormalizedString + doc/typings
|
2020-01-06 17:55:22 -05:00 |
|
|
6de04bbaea
|
Python - Add typings/doc for Encoding
|
2020-01-06 17:23:04 -05:00 |
|
|
7e9e0aa81c
|
Python - Add Tokenizer doc with stub file
|
2020-01-06 16:40:27 -05:00 |
|
|
9a99e2bcb1
|
Python - Add missing Bpe constructor kwargs
|
2020-01-06 16:39:59 -05:00 |
|
|
b7d0acc562
|
Python - Improve decode/decode_batch API
|
2020-01-06 16:39:36 -05:00 |
|
|
1a083a6e6f
|
Python - Improved stub file for models
|
2020-01-06 15:55:00 -05:00 |
|
|
0e41e0b327
|
Python - Include correct packages and stubs
|
2020-01-06 15:24:17 -05:00 |
|
|
8723f78e6f
|
Python - build-sdist.sh +x mode
|
2020-01-06 14:24:08 -05:00 |
|
|
d7b6385566
|
Python - Adding some stub files
|
2020-01-06 13:04:30 -05:00 |
|
|
7eebd06409
|
Python - Improve imports
|
2020-01-06 12:03:01 -05:00 |
|
|
e1caacfce0
|
Rename package for crates.io
|
2020-01-04 23:42:32 -05:00 |
|
|
fab4e96b51
|
Python - Add bert wordpiece training example
|
2020-01-03 19:37:29 -05:00 |
|