Commit Graph

110 Commits

Author SHA1 Message Date
9a99e2bcb1 Python - Add missing Bpe constructor kwargs 2020-01-06 16:39:59 -05:00
b7d0acc562 Python - Improve decode/decode_batch API 2020-01-06 16:39:36 -05:00
1a083a6e6f Python - Improved stub file for models 2020-01-06 15:55:00 -05:00
0e41e0b327 Python - Include correct packages and stubs 2020-01-06 15:24:17 -05:00
8723f78e6f Python - build-sdist.sh +x mode 2020-01-06 14:24:08 -05:00
d7b6385566 Python - Adding some stub files 2020-01-06 13:04:30 -05:00
7eebd06409 Python - Improve imports 2020-01-06 12:03:01 -05:00
e1caacfce0 Rename package for crates.io 2020-01-04 23:42:32 -05:00
fab4e96b51 Python - Add bert wordpiece training example 2020-01-03 19:37:29 -05:00
c51e340492 Python - Add WordPieceTrainer 2020-01-03 19:37:29 -05:00
e64b54b29e Python - Update BpeTrainer interface 2020-01-03 19:37:29 -05:00
408490e6b4 Add missing kwargs support 2020-01-02 19:32:56 -05:00
22e499133b Python - Expose missing BPE options at creation
cc @epwalsh
2020-01-02 19:30:50 -05:00
04cfeea2d5 Python - ByteLevel BPE training example file
cc @julien-c
2020-01-02 18:39:31 -05:00
0589deb6e2 Python - Expose BpeTrainer options 2020-01-02 18:09:04 -05:00
d3c3f5a700 Python - Expose ByteLevel alphabet 2020-01-02 18:06:06 -05:00
722b61230d BPE handles UNK token 2020-01-01 14:49:03 -05:00
47e4b00e05 BpeTrainer shows some progress 2020-01-01 01:28:17 -05:00
90dfdc715d Expose Tokenizer parts 2019-12-31 22:57:47 -05:00
f28ca58fd9 [Fix #17] BPE & WordPiece models saving 2019-12-31 13:56:28 -05:00
225a886382 Python - Expose Whitespace PreTokenizer 2019-12-30 13:10:33 -05:00
4677a09626 Python - Expose pad and truncate on Encoding 2019-12-30 12:56:07 -05:00
8ddb2de64e Update unicode-normalization to published crate 2019-12-30 12:18:00 -05:00
06d515d41b Python - Add ability to retrieve a range of string 2019-12-29 01:37:03 -05:00
049029dc42 Python - Restore methods on Encoding 2019-12-29 01:26:42 -05:00
9c574ad1b7 Python - Fix some import warnings 2019-12-29 00:43:32 -05:00
3779bf3e19 Python - Update example 2019-12-29 00:38:37 -05:00
3dcf9f763c Python - Update pre tokenizers with offsets 2019-12-29 00:37:58 -05:00
3f79d9d5e0 Python - Add normalizers bindings & BertNormalizer 2019-12-29 00:36:09 -05:00
839239d3b4 Bump version 2019-12-27 10:43:34 -05:00
bddf7ba737 Python - Fix building from wheels 2019-12-27 10:39:19 -05:00
ffd28ba558 Bump for release 2019-12-26 14:56:13 -05:00
74cc6f6bde Python - Simplify padding interface 2019-12-26 14:34:13 -05:00
d93d4fc3cd Python - Simplify truncation interface 2019-12-26 10:35:20 -05:00
a7734ffc9f Python - Update doc and readme for add_prefix_space 2019-12-26 10:34:53 -05:00
1879cb0bcb Python - change with_added_tokens as kwarg 2019-12-25 22:22:35 -05:00
905c1eb77e Python - update some packages 2019-12-25 22:16:43 -05:00
597031b973 Python - remove unused variable 2019-12-25 22:16:11 -05:00
9d289d357d Python - change add_prefix_space as kwarg 2019-12-25 22:10:17 -05:00
4bc5a7bbe7 Python - fix example 2019-12-24 11:20:40 -05:00
c0ed873c4d simplify initialization of BpeTrainer 2019-12-23 20:13:48 -05:00
fab1d4cabc Bump version for release 2019-12-23 17:28:38 -05:00
e01d4f2052 Python - Remove misleading __repr__ 2019-12-23 17:27:59 -05:00
2266960ef7 Bump version and update Readme 2019-12-20 10:26:40 -05:00
f2b9c30ad9 Handle vocab size with added tokens 2019-12-19 20:19:56 -05:00
b7040e0412 Option to skip special tokens while decoding 2019-12-19 20:03:02 -05:00
a8d68d516d Handle special tokens 2019-12-19 19:48:16 -05:00
9763282d59 Bump version for release 2019-12-17 18:42:34 -05:00
4d14b08afe ByteLevel handles prefix spaces 2019-12-17 18:35:40 -05:00
6766585965 Python - Do not expose non working features of Encoding 2019-12-17 17:43:42 -05:00