Commit Graph

194 Commits

Author SHA1 Message Date
Anthony MOI
722b61230d BPE handles UNK token 2020-01-01 14:49:03 -05:00
Anthony MOI
47e4b00e05 BpeTrainer shows some progress 2020-01-01 01:28:17 -05:00
Anthony MOI
90dfdc715d Expose Tokenizer parts 2019-12-31 22:57:47 -05:00
Anthony MOI
f28ca58fd9 [Fix #17] BPE & WordPiece models saving 2019-12-31 13:56:28 -05:00
Anthony MOI
225a886382 Python - Expose Whitespace PreTokenizer 2019-12-30 13:10:33 -05:00
Anthony MOI
4677a09626 Python - Expose pad and truncate on Encoding 2019-12-30 12:56:07 -05:00
Anthony MOI
8ddb2de64e Update unicode-normalization to published crate 2019-12-30 12:18:00 -05:00
Anthony MOI
06d515d41b Python - Add ability to retrieve a range of string 2019-12-29 01:37:03 -05:00
Anthony MOI
049029dc42 Python - Restore methods on Encoding 2019-12-29 01:26:42 -05:00
Anthony MOI
9c574ad1b7 Python - Fix some import warnings 2019-12-29 00:43:32 -05:00
Anthony MOI
3779bf3e19 Python - Update example 2019-12-29 00:38:37 -05:00
Anthony MOI
3dcf9f763c Python - Update pre tokenizers with offsets 2019-12-29 00:37:58 -05:00
Anthony MOI
3f79d9d5e0 Python - Add normalizers bindings & BertNormalizer 2019-12-29 00:36:09 -05:00
Anthony MOI
839239d3b4 Bump version 2019-12-27 10:43:34 -05:00
Anthony MOI
bddf7ba737 Python - Fix building from wheels 2019-12-27 10:39:19 -05:00
Anthony MOI
ffd28ba558 Bump for release 2019-12-26 14:56:13 -05:00
Anthony MOI
74cc6f6bde Python - Simplify padding interface 2019-12-26 14:34:13 -05:00
Anthony MOI
d93d4fc3cd Python - Simplify truncation interface 2019-12-26 10:35:20 -05:00
Anthony MOI
a7734ffc9f Python - Update doc and readme for add_prefix_space 2019-12-26 10:34:53 -05:00
Anthony MOI
1879cb0bcb Python - change with_added_tokens as kwarg 2019-12-25 22:22:35 -05:00
Anthony MOI
905c1eb77e Python - update some packages 2019-12-25 22:16:43 -05:00
Anthony MOI
597031b973 Python - remove unused variable 2019-12-25 22:16:11 -05:00
Anthony MOI
9d289d357d Python - change add_prefix_space as kwarg 2019-12-25 22:10:17 -05:00
Anthony MOI
4bc5a7bbe7 Python - fix example 2019-12-24 11:20:40 -05:00
epwalsh
c0ed873c4d simplify initialization of BpeTrainer 2019-12-23 20:13:48 -05:00
Anthony MOI
fab1d4cabc Bump version for release 2019-12-23 17:28:38 -05:00
Anthony MOI
e01d4f2052 Python - Remove misleading __repr__ 2019-12-23 17:27:59 -05:00
Anthony MOI
2266960ef7 Bump version and update Readme 2019-12-20 10:26:40 -05:00
Anthony MOI
f2b9c30ad9 Handle vocab size with added tokens 2019-12-19 20:19:56 -05:00
Anthony MOI
b7040e0412 Option to skip special tokens while decoding 2019-12-19 20:03:02 -05:00
Anthony MOI
a8d68d516d Handle special tokens 2019-12-19 19:48:16 -05:00
Anthony MOI
9763282d59 Bump version for release 2019-12-17 18:42:34 -05:00
Anthony MOI
4d14b08afe ByteLevel handles prefix spaces 2019-12-17 18:35:40 -05:00
Anthony MOI
6766585965 Python - Do not expose non working features of Encoding 2019-12-17 17:43:42 -05:00
Anthony MOI
0a3d4a86a9 Python - Update bindings for BertPreTokenizer 2019-12-17 17:40:56 -05:00
Anthony MOI
3f95248d6d Python - Truncation & padding bindings 2019-12-17 17:24:53 -05:00
Anthony MOI
08eb163415 Bump version for release 2019-12-16 19:38:33 -05:00
Anthony MOI
d80f752ec9 Python - Add some missing Encoding bindings 2019-12-16 19:38:18 -05:00
Anthony MOI
036ee603f4 Python - Update example 2019-12-16 18:50:21 -05:00
Anthony MOI
93a74aa53a Python - Expose PostProcessors 2019-12-16 18:46:14 -05:00
Anthony MOI
1a90cc96e5 Python - Can add tokens 2019-12-16 18:45:26 -05:00
Anthony MOI
ee883c3fc7 Bump version for release 2019-12-13 18:18:07 -05:00
Anthony MOI
ed7e3999d2 Python - Fix some clippy warnings 2019-12-13 18:17:51 -05:00
Anthony MOI
24139d7324 Improve some Python classes 2019-12-13 17:53:46 -05:00
Anthony MOI
1c4593cad4 Python - Remove warning on unused Token 2019-12-13 15:28:48 -05:00
Anthony MOI
e93cc62a71 Python - Handle kwargs for bert modules 2019-12-13 15:28:29 -05:00
Anthony MOI
3355be89cd Python - Update examples and improve errors 2019-12-13 14:37:29 -05:00
Anthony MOI
7cf4b3a6cd Python - Rewrite PyDecoder and PyPreTokenizer 2019-12-13 12:20:25 -05:00
Anthony MOI
2a0ad97809 Python - Update API to allow failure 2019-12-13 12:20:05 -05:00
Anthony MOI
1c7be358b7 Python - Better error conversions 2019-12-13 12:14:27 -05:00