tokenizers

mirror of https://github.com/mii443/tokenizers.git synced 2025-08-31 12:39:21 +00:00

Author	SHA1	Message	Date
Bjarte Johansen	2dc48e56ac	Python - Update pyo3 version * Use __new__ instead of static method as model constructors	2020-04-06 21:20:16 +02:00
Anthony MOI	477037fd6b	Python - Improve AddedToken repr	2020-04-01 17:25:55 -04:00
Anthony MOI	b055b77b54	Python - Add first tests: Tokenizer	2020-04-01 17:25:55 -04:00
Anthony MOI	a2a6d80017	Python - expost `get_vocab` on Tokenizer	2020-03-27 11:53:18 -04:00
Anthony MOI	9bd9e0b3c1	Expose post_process on the Tokenizer	2020-03-26 15:42:45 -04:00
Anthony MOI	f8d54edcdd	Python - Fix cases where str expected instead of AddedToken	2020-03-25 19:22:53 -04:00
Anthony MOI	c65d53892d	Python - Add bindings for new AddedToken options	2020-03-24 20:58:45 -04:00
Anthony MOI	60a4fb35f4	Python - Update bindings	2020-03-16 10:36:42 -04:00
Anthony MOI	257360acec	Python - encode & encode batch with add_special_tokens	2020-03-10 16:21:10 -04:00
Anthony MOI	f263d7651f	Python - RustFmt	2020-02-18 15:07:34 -05:00
Funtowicz Morgan	c4bac6aeeb	Expose num_added_tokens on Python side (#146 ) * Expose num_added_tokens on Python side without the need to pass an Encoding to added_tokens. This allows to compute the max sentence length for single/pair inputs without actually the need to have an Encoding structure. As the number of added tokens is fixed and static during compilation it allows more flexible usage of the method. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Renamed num_added_tokens to num_special_tokens_to_add. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-14 10:55:20 +00:00
Morgan Funtowicz	4839154145	Remove kwargs mapping on Tokenizer decode/decode_batch as their is only one possible arg. This is suggested by the current issue https://github.com/huggingface/tokenizers/issues/54#issuecomment-574104841. kwargs cannot be called as positional argument, they have to be named one, replacing kwargs with the actual skip_special_tokens allows both (named and positional) syntax. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-01-15 11:16:01 +01:00
Anthony MOI	fc56f8d186	Python - Update some naming	2020-01-08 09:54:03 -05:00
Anthony MOI	8bbf832842	Python - Use Getter/Setter to get/modify Tokenizer's parts	2020-01-07 15:17:23 -05:00
Anthony MOI	b7d0acc562	Python - Improve decode/decode_batch API	2020-01-06 16:39:36 -05:00
Anthony MOI	90dfdc715d	Expose Tokenizer parts	2019-12-31 22:57:47 -05:00
Anthony MOI	3f79d9d5e0	Python - Add normalizers bindings & BertNormalizer	2019-12-29 00:36:09 -05:00
Anthony MOI	74cc6f6bde	Python - Simplify padding interface	2019-12-26 14:34:13 -05:00
Anthony MOI	d93d4fc3cd	Python - Simplify truncation interface	2019-12-26 10:35:20 -05:00
Anthony MOI	1879cb0bcb	Python - change with_added_tokens as kwarg	2019-12-25 22:22:35 -05:00
Anthony MOI	f2b9c30ad9	Handle vocab size with added tokens	2019-12-19 20:19:56 -05:00
Anthony MOI	b7040e0412	Option to skip special tokens while decoding	2019-12-19 20:03:02 -05:00
Anthony MOI	a8d68d516d	Handle special tokens	2019-12-19 19:48:16 -05:00
Anthony MOI	3f95248d6d	Python - Truncation & padding bindings	2019-12-17 17:24:53 -05:00
Anthony MOI	93a74aa53a	Python - Expose PostProcessors	2019-12-16 18:46:14 -05:00
Anthony MOI	1a90cc96e5	Python - Can add tokens	2019-12-16 18:45:26 -05:00
Anthony MOI	ed7e3999d2	Python - Fix some clippy warnings	2019-12-13 18:17:51 -05:00
Anthony MOI	2a0ad97809	Python - Update API to allow failure	2019-12-13 12:20:05 -05:00
Anthony MOI	b4b31d73cd	Expose vocabulary size	2019-12-10 16:20:31 -05:00
Anthony MOI	6c294c60b0	Python - Add Encoding repr + improve example	2019-12-10 15:18:07 -05:00
Anthony MOI	8cedc5f1f6	Update Python bindings for Encoding	2019-12-10 12:38:36 -05:00
Anthony MOI	849272d44f	Python - add missing modules exports	2019-12-09 12:50:53 -05:00
Anthony MOI	eaafb22511	Add bindings for Trainer in Python	2019-12-03 15:54:15 -05:00
Anthony MOI	8fbe3c2662	Python - Add decoders	2019-11-22 21:08:57 -05:00
Anthony MOI	e44f52024c	Python - Set a PreTokenizer in a model	2019-11-22 21:01:52 -05:00
Anthony MOI	39a6d04c53	Improve Python bindings This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.	2019-11-22 17:57:36 -05:00

36 Commits