Commit Graph

124 Commits

Author SHA1 Message Date
cbdd2cf423 Python - add Metaspace decoder 2020-01-07 18:40:18 -05:00
4e026b57a8 Python - quick fix stub file 2020-01-07 16:18:28 -05:00
3f806a2b5f Python - Also update README 2020-01-07 15:24:39 -05:00
cc33418044 Python - Update examples with getter/setter 2020-01-07 15:23:11 -05:00
8bbf832842 Python - Use Getter/Setter to get/modify Tokenizer's parts 2020-01-07 15:17:23 -05:00
eaa23ac8e6 Add the Metaspace PreTokenizer 2020-01-07 12:59:59 -05:00
b06681cb1e Bump version for release 2020-01-06 21:05:01 -05:00
185b6f0b8b Add Sequence Normalizer 2020-01-06 21:03:05 -05:00
5c02bbbc4c Add basic unicode normalizers 2020-01-06 20:38:42 -05:00
4b9ae66419 WordPiece decoder with customizable prefix 2020-01-06 20:20:42 -05:00
772d0680b6 Python - Update all typings 2020-01-06 20:03:00 -05:00
0079a7a6b7 Python - Add NormalizedString + doc/typings 2020-01-06 17:55:22 -05:00
6de04bbaea Python - Add typings/doc for Encoding 2020-01-06 17:23:04 -05:00
7e9e0aa81c Python - Add Tokenizer doc with stub file 2020-01-06 16:40:27 -05:00
9a99e2bcb1 Python - Add missing Bpe constructor kwargs 2020-01-06 16:39:59 -05:00
b7d0acc562 Python - Improve decode/decode_batch API 2020-01-06 16:39:36 -05:00
1a083a6e6f Python - Improved stub file for models 2020-01-06 15:55:00 -05:00
0e41e0b327 Python - Include correct packages and stubs 2020-01-06 15:24:17 -05:00
8723f78e6f Python - build-sdist.sh +x mode 2020-01-06 14:24:08 -05:00
d7b6385566 Python - Adding some stub files 2020-01-06 13:04:30 -05:00
7eebd06409 Python - Improve imports 2020-01-06 12:03:01 -05:00
e1caacfce0 Rename package for crates.io 2020-01-04 23:42:32 -05:00
fab4e96b51 Python - Add bert wordpiece training example 2020-01-03 19:37:29 -05:00
c51e340492 Python - Add WordPieceTrainer 2020-01-03 19:37:29 -05:00
e64b54b29e Python - Update BpeTrainer interface 2020-01-03 19:37:29 -05:00
408490e6b4 Add missing kwargs support 2020-01-02 19:32:56 -05:00
22e499133b Python - Expose missing BPE options at creation
cc @epwalsh
2020-01-02 19:30:50 -05:00
04cfeea2d5 Python - ByteLevel BPE training example file
cc @julien-c
2020-01-02 18:39:31 -05:00
0589deb6e2 Python - Expose BpeTrainer options 2020-01-02 18:09:04 -05:00
d3c3f5a700 Python - Expose ByteLevel alphabet 2020-01-02 18:06:06 -05:00
722b61230d BPE handles UNK token 2020-01-01 14:49:03 -05:00
47e4b00e05 BpeTrainer shows some progress 2020-01-01 01:28:17 -05:00
90dfdc715d Expose Tokenizer parts 2019-12-31 22:57:47 -05:00
f28ca58fd9 [Fix #17] BPE & WordPiece models saving 2019-12-31 13:56:28 -05:00
225a886382 Python - Expose Whitespace PreTokenizer 2019-12-30 13:10:33 -05:00
4677a09626 Python - Expose pad and truncate on Encoding 2019-12-30 12:56:07 -05:00
8ddb2de64e Update unicode-normalization to published crate 2019-12-30 12:18:00 -05:00
06d515d41b Python - Add ability to retrieve a range of string 2019-12-29 01:37:03 -05:00
049029dc42 Python - Restore methods on Encoding 2019-12-29 01:26:42 -05:00
9c574ad1b7 Python - Fix some import warnings 2019-12-29 00:43:32 -05:00
3779bf3e19 Python - Update example 2019-12-29 00:38:37 -05:00
3dcf9f763c Python - Update pre tokenizers with offsets 2019-12-29 00:37:58 -05:00
3f79d9d5e0 Python - Add normalizers bindings & BertNormalizer 2019-12-29 00:36:09 -05:00
839239d3b4 Bump version 2019-12-27 10:43:34 -05:00
bddf7ba737 Python - Fix building from wheels 2019-12-27 10:39:19 -05:00
ffd28ba558 Bump for release 2019-12-26 14:56:13 -05:00
74cc6f6bde Python - Simplify padding interface 2019-12-26 14:34:13 -05:00
d93d4fc3cd Python - Simplify truncation interface 2019-12-26 10:35:20 -05:00
a7734ffc9f Python - Update doc and readme for add_prefix_space 2019-12-26 10:34:53 -05:00
1879cb0bcb Python - change with_added_tokens as kwarg 2019-12-25 22:22:35 -05:00