Anthony MOI
835f08ab02
Python - Update bindings for new encode
2020-05-01 17:11:54 -04:00
Bjarte Johansen
2dc48e56ac
Python - Update pyo3 version
...
* Use __new__ instead of static method as model constructors
2020-04-06 21:20:16 +02:00
Anthony MOI
477037fd6b
Python - Improve AddedToken repr
2020-04-01 17:25:55 -04:00
Anthony MOI
b055b77b54
Python - Add first tests: Tokenizer
2020-04-01 17:25:55 -04:00
Anthony MOI
a2a6d80017
Python - expost get_vocab on Tokenizer
2020-03-27 11:53:18 -04:00
Anthony MOI
9bd9e0b3c1
Expose post_process on the Tokenizer
2020-03-26 15:42:45 -04:00
Anthony MOI
f8d54edcdd
Python - Fix cases where str expected instead of AddedToken
2020-03-25 19:22:53 -04:00
Anthony MOI
c65d53892d
Python - Add bindings for new AddedToken options
2020-03-24 20:58:45 -04:00
Anthony MOI
60a4fb35f4
Python - Update bindings
2020-03-16 10:36:42 -04:00
Anthony MOI
257360acec
Python - encode & encode batch with add_special_tokens
2020-03-10 16:21:10 -04:00
Anthony MOI
f263d7651f
Python - RustFmt
2020-02-18 15:07:34 -05:00
Funtowicz Morgan
c4bac6aeeb
Expose num_added_tokens on Python side ( #146 )
...
* Expose num_added_tokens on Python side without the need to pass an Encoding to added_tokens.
This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure.
As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Renamed num_added_tokens to num_special_tokens_to_add.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-14 10:55:20 +00:00
Morgan Funtowicz
4839154145
Remove kwargs mapping on Tokenizer decode/decode_batch as their is only one possible arg.
...
This is suggested by the current issue https://github.com/huggingface/tokenizers/issues/54#issuecomment-574104841 .
kwargs cannot be called as positional argument, they have to be named one, replacing kwargs with the actual skip_special_tokens
allows both (named and positional) syntax.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-01-15 11:16:01 +01:00
Anthony MOI
fc56f8d186
Python - Update some naming
2020-01-08 09:54:03 -05:00
Anthony MOI
8bbf832842
Python - Use Getter/Setter to get/modify Tokenizer's parts
2020-01-07 15:17:23 -05:00
Anthony MOI
b7d0acc562
Python - Improve decode/decode_batch API
2020-01-06 16:39:36 -05:00
Anthony MOI
90dfdc715d
Expose Tokenizer parts
2019-12-31 22:57:47 -05:00
Anthony MOI
3f79d9d5e0
Python - Add normalizers bindings & BertNormalizer
2019-12-29 00:36:09 -05:00
Anthony MOI
74cc6f6bde
Python - Simplify padding interface
2019-12-26 14:34:13 -05:00
Anthony MOI
d93d4fc3cd
Python - Simplify truncation interface
2019-12-26 10:35:20 -05:00
Anthony MOI
1879cb0bcb
Python - change with_added_tokens as kwarg
2019-12-25 22:22:35 -05:00
Anthony MOI
f2b9c30ad9
Handle vocab size with added tokens
2019-12-19 20:19:56 -05:00
Anthony MOI
b7040e0412
Option to skip special tokens while decoding
2019-12-19 20:03:02 -05:00
Anthony MOI
a8d68d516d
Handle special tokens
2019-12-19 19:48:16 -05:00
Anthony MOI
3f95248d6d
Python - Truncation & padding bindings
2019-12-17 17:24:53 -05:00
Anthony MOI
93a74aa53a
Python - Expose PostProcessors
2019-12-16 18:46:14 -05:00
Anthony MOI
1a90cc96e5
Python - Can add tokens
2019-12-16 18:45:26 -05:00
Anthony MOI
ed7e3999d2
Python - Fix some clippy warnings
2019-12-13 18:17:51 -05:00
Anthony MOI
2a0ad97809
Python - Update API to allow failure
2019-12-13 12:20:05 -05:00
Anthony MOI
b4b31d73cd
Expose vocabulary size
2019-12-10 16:20:31 -05:00
Anthony MOI
6c294c60b0
Python - Add Encoding repr + improve example
2019-12-10 15:18:07 -05:00
Anthony MOI
8cedc5f1f6
Update Python bindings for Encoding
2019-12-10 12:38:36 -05:00
Anthony MOI
849272d44f
Python - add missing modules exports
2019-12-09 12:50:53 -05:00
Anthony MOI
eaafb22511
Add bindings for Trainer in Python
2019-12-03 15:54:15 -05:00
Anthony MOI
8fbe3c2662
Python - Add decoders
2019-11-22 21:08:57 -05:00
Anthony MOI
e44f52024c
Python - Set a PreTokenizer in a model
2019-11-22 21:01:52 -05:00
Anthony MOI
39a6d04c53
Improve Python bindings
...
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00