Commit Graph

37 Commits

Author SHA1 Message Date
Anthony MOI
835f08ab02 Python - Update bindings for new encode 2020-05-01 17:11:54 -04:00
Bjarte Johansen
2dc48e56ac Python - Update pyo3 version
* Use __new__ instead of static method as model constructors
2020-04-06 21:20:16 +02:00
Anthony MOI
477037fd6b Python - Improve AddedToken repr 2020-04-01 17:25:55 -04:00
Anthony MOI
b055b77b54 Python - Add first tests: Tokenizer 2020-04-01 17:25:55 -04:00
Anthony MOI
a2a6d80017 Python - expost get_vocab on Tokenizer 2020-03-27 11:53:18 -04:00
Anthony MOI
9bd9e0b3c1 Expose post_process on the Tokenizer 2020-03-26 15:42:45 -04:00
Anthony MOI
f8d54edcdd Python - Fix cases where str expected instead of AddedToken 2020-03-25 19:22:53 -04:00
Anthony MOI
c65d53892d Python - Add bindings for new AddedToken options 2020-03-24 20:58:45 -04:00
Anthony MOI
60a4fb35f4 Python - Update bindings 2020-03-16 10:36:42 -04:00
Anthony MOI
257360acec Python - encode & encode batch with add_special_tokens 2020-03-10 16:21:10 -04:00
Anthony MOI
f263d7651f Python - RustFmt 2020-02-18 15:07:34 -05:00
Funtowicz Morgan
c4bac6aeeb Expose num_added_tokens on Python side (#146)
* Expose num_added_tokens on Python side without the need to pass an Encoding to added_tokens.

This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure.
As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Renamed num_added_tokens to num_special_tokens_to_add.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-14 10:55:20 +00:00
Morgan Funtowicz
4839154145 Remove kwargs mapping on Tokenizer decode/decode_batch as their is only one possible arg.
This is suggested by the current issue https://github.com/huggingface/tokenizers/issues/54#issuecomment-574104841.

kwargs cannot be called as positional argument, they have to be named one, replacing kwargs with the actual skip_special_tokens
allows both (named and positional) syntax.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-15 11:16:01 +01:00
Anthony MOI
fc56f8d186 Python - Update some naming 2020-01-08 09:54:03 -05:00
Anthony MOI
8bbf832842 Python - Use Getter/Setter to get/modify Tokenizer's parts 2020-01-07 15:17:23 -05:00
Anthony MOI
b7d0acc562 Python - Improve decode/decode_batch API 2020-01-06 16:39:36 -05:00
Anthony MOI
90dfdc715d Expose Tokenizer parts 2019-12-31 22:57:47 -05:00
Anthony MOI
3f79d9d5e0 Python - Add normalizers bindings & BertNormalizer 2019-12-29 00:36:09 -05:00
Anthony MOI
74cc6f6bde Python - Simplify padding interface 2019-12-26 14:34:13 -05:00
Anthony MOI
d93d4fc3cd Python - Simplify truncation interface 2019-12-26 10:35:20 -05:00
Anthony MOI
1879cb0bcb Python - change with_added_tokens as kwarg 2019-12-25 22:22:35 -05:00
Anthony MOI
f2b9c30ad9 Handle vocab size with added tokens 2019-12-19 20:19:56 -05:00
Anthony MOI
b7040e0412 Option to skip special tokens while decoding 2019-12-19 20:03:02 -05:00
Anthony MOI
a8d68d516d Handle special tokens 2019-12-19 19:48:16 -05:00
Anthony MOI
3f95248d6d Python - Truncation & padding bindings 2019-12-17 17:24:53 -05:00
Anthony MOI
93a74aa53a Python - Expose PostProcessors 2019-12-16 18:46:14 -05:00
Anthony MOI
1a90cc96e5 Python - Can add tokens 2019-12-16 18:45:26 -05:00
Anthony MOI
ed7e3999d2 Python - Fix some clippy warnings 2019-12-13 18:17:51 -05:00
Anthony MOI
2a0ad97809 Python - Update API to allow failure 2019-12-13 12:20:05 -05:00
Anthony MOI
b4b31d73cd Expose vocabulary size 2019-12-10 16:20:31 -05:00
Anthony MOI
6c294c60b0 Python - Add Encoding repr + improve example 2019-12-10 15:18:07 -05:00
Anthony MOI
8cedc5f1f6 Update Python bindings for Encoding 2019-12-10 12:38:36 -05:00
Anthony MOI
849272d44f Python - add missing modules exports 2019-12-09 12:50:53 -05:00
Anthony MOI
eaafb22511 Add bindings for Trainer in Python 2019-12-03 15:54:15 -05:00
Anthony MOI
8fbe3c2662 Python - Add decoders 2019-11-22 21:08:57 -05:00
Anthony MOI
e44f52024c Python - Set a PreTokenizer in a model 2019-11-22 21:01:52 -05:00
Anthony MOI
39a6d04c53 Improve Python bindings
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00