tokenizers

mirror of https://github.com/mii443/tokenizers.git synced 2025-08-22 16:25:30 +00:00

Author	SHA1	Message	Date
Nicolas Patry	fb292d1eae	0.13.4.rc1 (#1319 )	2023-08-14 12:06:43 +02:00
Arthur	ce244bd094	remove rc1	2023-04-04 16:19:42 +02:00
Nicolas Patry	1cb44bd180	New version 0.13.3	2023-04-04 14:14:17 +02:00
Cameron	11bb2e00f2	Add python 3.11 to manylinux buildwheels (#1096 ) * Add python 3.11 to manylinux buildwheels * Fixing clippy. * Node clippy. * Python clippy. * Changelog + version number update. Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2022-11-07 08:45:04 +01:00
Nicolas Patry	96a9e5715c	New version. (#1082 ) * New version. The actual release will happen before PyO3 0.17.2 because the tests were ran before than. * Manylinux2014 necessary now with Rust 1.64.	2022-10-06 15:45:56 +02:00
Nicolas Patry	6113666624	Updating python formatting. (#1079 ) * Updating python formatting. * Forgot gh action. * Skipping isort to prevent circular imports. * Updating stub. * Removing `isort` (it contradicts `stub.py`). * Fixing weird stub black/isort disagreeement.	2022-10-05 15:29:33 +02:00
Nicolas Patry	6e5569a540	Moving versions numbers to `dev` mode. (#1067 )	2022-09-22 18:24:07 +02:00
Nicolas Patry	63082c4d11	Enabling static interpreter embedding for manylinux. (#1064 ) * Removing dead file. * Checking that we can distribute with static python embedding for manylinux * Many linux embed interpreter. * Building wheels manylinux with static embedding * Better script. * typo. * Using a dummy feature? * default features ? * Back into order. * Fixing manylinux ??. * Local dir. * Missing star. * Makedir ? * Monkey coding this. * extension module ? * Building with default features `RustExtension`. * bdist_wheel + rustextension any better ? * update rust-py version. * Forcing extension module. * No default features. * Remove py37 out of spite * Revert "Remove py37 out of spite" This reverts commit 6ab7facd792b59c2e30be82fe42816d24c32cf0d. * Really extraneous feature. * Fix build wheels. * Putting things back in place.	2022-09-21 12:18:46 +02:00
Nicolas Patry	7bfab48979	Preparing rc1 release. (#1056 ) * Preparing rc1 release. * Fixing test_alignment_methods * Fixing the overflowing sequence_id issue (LayoutLMv2 tests caught this). * Adding overly complex overflowing test.	2022-09-12 16:07:06 +02:00
Mishig Davaadorj	e6cd73a291	`.dev0` suffix in python version (#987 )	2022-04-22 09:36:18 +02:00
Mishig Davaadorj	95b5d066d5	Update doc build gh workflow to install rust	2022-04-21 09:20:20 +02:00
Mishig Davaadorj	c2aa87a256	Add `setup.py` extras["dev"]	2022-04-19 15:14:44 +02:00
Nicolas Patry	8a9bb28f46	Preparing for 0.12.1 (#978 ) * Preparing for 0.12.1 * Updated the changelog.	2022-04-12 17:57:33 +02:00
Nicolas Patry	0eb7455fe5	Preparing `0.12` release. (#967 ) * Preparing `0.12` release. * Fix click version: https://github.com/psf/black/issues/2964	2022-03-31 11:06:33 +02:00
Nicolas Patry	ffaee13994	Preparing for 0.11.6 release.	2022-02-28 10:20:49 +01:00
Nicolas Patry	5679323bbc	Minor version bump.	2022-02-16 12:51:11 +01:00
Nicolas Patry	9b85424520	Version bump.	2022-01-17 22:30:25 +01:00
Nicolas Patry	ab9a2f3100	Update versions.	2022-01-17 09:40:01 +01:00
JC Louis	cabbecb96c	add python3.10 release (#877 ) * add missing python3.9 classifier * add python3.10 release * run tests on 3.10 * Revert "run tests on 3.10" This reverts commit ceed64249e54b6ec622b06c59bf47da7c6dfc1b0.	2022-01-12 09:42:13 +01:00
Nicolas Patry	8e0d66a254	New python version.	2022-01-04 14:58:02 +01:00
Nicolas Patry	7069988ffe	Update to 0.11.1	2021-12-28 13:59:31 +01:00
Anthony MOI	b0ee27847f	Python - Prepare for release 0.11.0 (#799 )	2021-09-08 03:15:47 -04:00
Anthony Moi	a4d0f3dd18	Update docs for from_pretrained	2021-08-31 09:00:05 -04:00
SaulLu	da4c7b10e4	Add a way to specify the unknown token in `SentencePieceUnigramTokenizer` python implem (#762 ) * add a way to specify the unknown token in `SentencePieceUnigramTokenizer` * add test that verify that an exception is raised for the missing unknown token * style * add test tokens	2021-08-12 09:42:44 -04:00
Anthony MOI	3a002c1aa8	Python - prepare for release 0.10.3	2021-05-24 16:59:10 -04:00
Anthony MOI	32b3b7a0f2	Python - Prepare for release 0.10.2	2021-04-05 16:47:55 -04:00
Anthony MOI	bc8bbf637a	Prepare for python v0.10.1 (#625 )	2021-02-08 11:45:56 -05:00
Anthony MOI	d96442cbe8	Python - Prepare for release 0.10.1rc1 (#622 )	2021-02-04 10:37:00 -05:00
Anthony MOI	719bea76b9	Python - Prepare for release 0.10.0	2021-01-12 16:34:04 -05:00
Anthony MOI	0c6cc39eee	Python - Update CHANGELOG and bump for release	2020-12-08 13:29:35 -05:00
Tal Perry	8916b6bb27	Add a visualization utility to render tokens and annotations in a notebook (#508 ) * Draft functionality of visualization * Added comments to make code more intelligble * polish the styles * Ensure colors are stable and comment the css * Code clean up * Made visualizer importable and added some docs * Fix styling * implement comments from PR * Fixed the regex for UNK tokens and examples in notebook * Converted docs to google format * Added a notebook showing multiple languages and tokenizers * Added visual indication of chars that are tokenized with >1 token * Reorganize things a bit and fix import * Update docs Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>	2020-12-04 10:25:56 -05:00
Anthony MOI	75b41dab0f	Python - Update CHANGELOG and bump version for 0.9.4	2020-11-09 16:36:04 -05:00
Anthony MOI	2364d376f7	Python - Update CHANGELOG and bump to 0.9.3 for release	2020-10-26 16:40:24 -04:00
Anthony MOI	91f602f744	Python - Update CHANGELOG and bump to 0.9.2 for release	2020-10-15 10:14:58 -04:00
Anthony MOI	f94a274702	Python - Update CHANGELOG and bump version for release	2020-10-13 14:45:21 -04:00
Anthony MOI	4f4ba4a11a	Python - Bump version for 0.9.0 release	2020-10-09 13:00:19 -04:00
Nicolas Patry	dd9fda5d05	Bump rc version.	2020-10-06 11:04:36 +02:00
Anthony MOI	aebf510c5a	Python - Update CHANGELOG and bump to 0.9.0.rc1	2020-09-29 10:24:24 -04:00
Anthony MOI	171a042ee0	Python - Bump version for dev4 release	2020-09-24 10:16:18 -04:00
Nicolas Patry	c536b4992b	Move to dev3 build.	2020-09-22 08:21:38 +02:00
Nicolas Patry	330876ae02	Improvements on spm parity: (#401 ) * Removing all pre_tokenizer logic from Unigram algorithm. * Improving a lot the parity check. - We can now detect a lot more errors - Special cases have been added temporarily. * Adding 2 new normalizers that mimick spm defaut's behavior. * Adding `encoding_optimized` version of the `encode` algorithm. - Removes Lattice allocation. - Changes trie `common_prefix_search` to return an iterator to avoid allocation of the full results. * Trie<char> -> Trie<u8> Another improvement on speed. * [WIP] Attempt to create a Precompiled Normalizer from SPM to be 100% compliant with arbitrary models. * Adding a new `Precompiled` Normalizer that is replacing `SpmNmtNfkc`. - It will be used for direct compatiblity with `Spm` and replace all their custom rules by using directly the normalizer spec embedded within spm files, removing all need for any rules for us. - We need `nom` dependency to parse the binary format of `spm`. - We need to add `sentencepiece_model_pb2.py` file to be able to read the proto file. - We reimplemented their `Darts::DoubleArray` compact trie format. * Fixing a bug with Precompiled normalizer. * Fixing some edge cases (now in tests) with this weird precompiled normalizer. It seems a very handy crafted trie does not prevent from shooting oneself in the foot. Sorry future reader. * Keep API stable for this PR (change of the API should come later #409). - Removed sentencepiece_model_pb2 from binding and add instructions to make `from_spm` work. * Adding model check in `from_spm`. * Adressing @n1t0's comments. * Adding a check to make sure alignments stay correct. Also added a bit more documentation on how Precompiled works. * Extracting `Precompiled` into it's own `spm_precompiled` crate. * Using ranges in `do_nmt`.	2020-09-15 22:21:02 +02:00
Anthony MOI	b8f1eb48cb	Python - Bump version for 0.9.0.dev1 release	2020-09-02 22:31:01 -04:00
Anthony MOI	c036cd4ced	Python - Bump version for 0.9.0.dev0 release	2020-08-21 18:52:29 -04:00
Sebastian Pütz	0d7c232f95	Move Python source to subdirectory. This allows testing versions not built in-place. Otherwise importing (or testing) in the package root fails without develop builds. Replace maturin with setuptools_rust since maturin fails with proper project structure.	2020-07-25 23:40:47 +02:00
Anthony MOI	c901f86d52	Python - Bump version for 0.8.1	2020-07-20 16:33:48 -04:00
Anthony MOI	157feed9a5	Python - Bump version for 0.8.1.rc2	2020-07-17 13:12:23 -04:00
Anthony MOI	5be375eaea	Update CHANGELOGs and bump version for python release	2020-07-06 15:21:47 -04:00
Anthony MOI	6349ca51b3	Python - Bump version for 0.8.0 release	2020-06-26 16:12:26 -04:00
Anthony MOI	8ae1982149	Finally it will be rc4 for transformers	2020-06-26 15:36:08 -04:00
Anthony MOI	5a653869af	Try local version for transformers	2020-06-26 15:19:00 -04:00

1 2

96 Commits