2dc48e56ac
Python - Update pyo3 version
...
* Use __new__ instead of static method as model constructors
2020-04-06 21:20:16 +02:00
477037fd6b
Python - Improve AddedToken repr
2020-04-01 17:25:55 -04:00
b055b77b54
Python - Add first tests: Tokenizer
2020-04-01 17:25:55 -04:00
a2a6d80017
Python - expost get_vocab
on Tokenizer
2020-03-27 11:53:18 -04:00
9bd9e0b3c1
Expose post_process on the Tokenizer
2020-03-26 15:42:45 -04:00
f8d54edcdd
Python - Fix cases where str expected instead of AddedToken
2020-03-25 19:22:53 -04:00
c65d53892d
Python - Add bindings for new AddedToken options
2020-03-24 20:58:45 -04:00
60a4fb35f4
Python - Update bindings
2020-03-16 10:36:42 -04:00
257360acec
Python - encode & encode batch with add_special_tokens
2020-03-10 16:21:10 -04:00
f263d7651f
Python - RustFmt
2020-02-18 15:07:34 -05:00
c4bac6aeeb
Expose num_added_tokens on Python side ( #146 )
...
* Expose num_added_tokens on Python side without the need to pass an Encoding to added_tokens.
This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure.
As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Renamed num_added_tokens to num_special_tokens_to_add.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-14 10:55:20 +00:00
4839154145
Remove kwargs mapping on Tokenizer decode/decode_batch as their is only one possible arg.
...
This is suggested by the current issue https://github.com/huggingface/tokenizers/issues/54#issuecomment-574104841 .
kwargs cannot be called as positional argument, they have to be named one, replacing kwargs with the actual skip_special_tokens
allows both (named and positional) syntax.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-01-15 11:16:01 +01:00
fc56f8d186
Python - Update some naming
2020-01-08 09:54:03 -05:00
8bbf832842
Python - Use Getter/Setter to get/modify Tokenizer's parts
2020-01-07 15:17:23 -05:00
b7d0acc562
Python - Improve decode/decode_batch API
2020-01-06 16:39:36 -05:00
90dfdc715d
Expose Tokenizer parts
2019-12-31 22:57:47 -05:00
3f79d9d5e0
Python - Add normalizers bindings & BertNormalizer
2019-12-29 00:36:09 -05:00
74cc6f6bde
Python - Simplify padding interface
2019-12-26 14:34:13 -05:00
d93d4fc3cd
Python - Simplify truncation interface
2019-12-26 10:35:20 -05:00
1879cb0bcb
Python - change with_added_tokens as kwarg
2019-12-25 22:22:35 -05:00
f2b9c30ad9
Handle vocab size with added tokens
2019-12-19 20:19:56 -05:00
b7040e0412
Option to skip special tokens while decoding
2019-12-19 20:03:02 -05:00
a8d68d516d
Handle special tokens
2019-12-19 19:48:16 -05:00
3f95248d6d
Python - Truncation & padding bindings
2019-12-17 17:24:53 -05:00
93a74aa53a
Python - Expose PostProcessors
2019-12-16 18:46:14 -05:00
1a90cc96e5
Python - Can add tokens
2019-12-16 18:45:26 -05:00
ed7e3999d2
Python - Fix some clippy warnings
2019-12-13 18:17:51 -05:00
2a0ad97809
Python - Update API to allow failure
2019-12-13 12:20:05 -05:00
b4b31d73cd
Expose vocabulary size
2019-12-10 16:20:31 -05:00
6c294c60b0
Python - Add Encoding repr + improve example
2019-12-10 15:18:07 -05:00
8cedc5f1f6
Update Python bindings for Encoding
2019-12-10 12:38:36 -05:00
849272d44f
Python - add missing modules exports
2019-12-09 12:50:53 -05:00
eaafb22511
Add bindings for Trainer in Python
2019-12-03 15:54:15 -05:00
8fbe3c2662
Python - Add decoders
2019-11-22 21:08:57 -05:00
e44f52024c
Python - Set a PreTokenizer in a model
2019-11-22 21:01:52 -05:00
39a6d04c53
Improve Python bindings
...
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00