tokenizers

mirror of https://github.com/mii443/tokenizers.git synced 2025-08-23 00:35:35 +00:00

Author	SHA1	Message	Date
Anthony MOI	324aa2930a	Doc - Improve python and node tests	2020-11-02 17:07:27 -05:00
Anthony MOI	4cf0a0b72c	Doc - Quicktour uses python tested code	2020-11-02 17:07:27 -05:00
Nicolas Patry	88556790e7	Fixing a bug where long tokenizer files would be incorrectly deserialized (#459 ) * Fixing a bug where long tokenizer files would be incorrectly deserialized - Add a bunch of tests to check deserialization behaviour - One tests also confirms current Single deserialization of Sequence. * Better test locations for Windows + no file dependency in Python binding Rust side. * Adressing @n1t0 comments.	2020-10-13 18:44:24 +02:00
Nicolas Patry	fbca797b3d	Fixing Trainer with u8 instead of chars. (#452 ) * Fixing Trainer with u8 instead of chars. Now check both optimized and unoptimized encodings schemes for Unigram. * Small fixes. * Fixing makefile.	2020-10-09 18:57:14 +02:00
Nicolas Patry	816632c9fa	Removing `--release` compat test. - Leaving the one that checks that sampling follows the expected distribution. - Marking the python Unigram.train(..) test as slow - The python Unigram.train(..) test now uses `big.txt` file.	2020-09-02 13:38:14 -04:00
Nicolas Patry	d0366529b7	Use a smaller train file.	2020-09-02 13:38:14 -04:00
Nicolas Patry	7b5c2b92c6	Fixing test dependency.	2020-09-02 13:38:14 -04:00
Anthony MOI	aa3b39f692	Python - Tests for parallelism with multiprocessing Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com>	2020-06-23 11:25:39 -04:00
Anthony MOI	7fd7dfd113	Python - Test CharBPETokenizer	2020-04-01 17:25:56 -04:00
Anthony MOI	dbc23e20a9	Python - Test Models	2020-04-01 17:25:55 -04:00
Anthony MOI	023566fbbb	Python - Add some tests utils	2020-04-01 17:25:55 -04:00

11 Commits