Commit Graph

92 Commits

Author SHA1 Message Date
96a9e5715c New version. (#1082)
* New version.

The actual release will happen *before* PyO3 0.17.2 because
the tests were ran before than.

* Manylinux2014 necessary now with Rust 1.64.
2022-10-06 15:45:56 +02:00
6113666624 Updating python formatting. (#1079)
* Updating python formatting.

* Forgot gh action.

* Skipping isort to prevent circular imports.

* Updating stub.

* Removing `isort` (it contradicts `stub.py`).

* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
6e5569a540 Moving versions numbers to dev mode. (#1067) 2022-09-22 18:24:07 +02:00
63082c4d11 Enabling static interpreter embedding for manylinux. (#1064)
* Removing dead file.

* Checking that we can distribute with static python embedding for

manylinux

* Many linux embed interpreter.

* Building wheels manylinux with static embedding

* Better script.

* typo.

* Using a dummy feature?

* default features ?

* Back into order.

* Fixing manylinux ??.

* Local dir.

* Missing star.

* Makedir ?

* Monkey coding this.

* extension module ?

* Building with default features `RustExtension`.

* bdist_wheel + rustextension any better ?

* update rust-py version.

* Forcing extension module.

* No default features.

* Remove py37 out of spite

* Revert "Remove py37 out of spite"

This reverts commit 6ab7facd792b59c2e30be82fe42816d24c32cf0d.

* Really extraneous feature.

* Fix build wheels.

* Putting things back in place.
2022-09-21 12:18:46 +02:00
7bfab48979 Preparing rc1 release. (#1056)
* Preparing rc1 release.

* Fixing test_alignment_methods

* Fixing the overflowing sequence_id issue (LayoutLMv2 tests caught this).

* Adding overly complex overflowing test.
2022-09-12 16:07:06 +02:00
e6cd73a291 .dev0 suffix in python version (#987) 2022-04-22 09:36:18 +02:00
95b5d066d5 Update doc build gh workflow to install rust 2022-04-21 09:20:20 +02:00
c2aa87a256 Add setup.py extras["dev"] 2022-04-19 15:14:44 +02:00
8a9bb28f46 Preparing for 0.12.1 (#978)
* Preparing for 0.12.1

* Updated the changelog.
2022-04-12 17:57:33 +02:00
0eb7455fe5 Preparing 0.12 release. (#967)
* Preparing `0.12` release.

* Fix click version: https://github.com/psf/black/issues/2964
2022-03-31 11:06:33 +02:00
ffaee13994 Preparing for 0.11.6 release. 2022-02-28 10:20:49 +01:00
5679323bbc Minor version bump. 2022-02-16 12:51:11 +01:00
9b85424520 Version bump. 2022-01-17 22:30:25 +01:00
ab9a2f3100 Update versions. 2022-01-17 09:40:01 +01:00
cabbecb96c add python3.10 release (#877)
* add missing python3.9 classifier

* add python3.10 release

* run tests on 3.10

* Revert "run tests on 3.10"

This reverts commit ceed64249e54b6ec622b06c59bf47da7c6dfc1b0.
2022-01-12 09:42:13 +01:00
8e0d66a254 New python version. 2022-01-04 14:58:02 +01:00
7069988ffe Update to 0.11.1 2021-12-28 13:59:31 +01:00
b0ee27847f Python - Prepare for release 0.11.0 (#799) 2021-09-08 03:15:47 -04:00
a4d0f3dd18 Update docs for from_pretrained 2021-08-31 09:00:05 -04:00
da4c7b10e4 Add a way to specify the unknown token in SentencePieceUnigramTokenizer python implem (#762)
* add a way to specify the unknown token in `SentencePieceUnigramTokenizer`

* add test that verify that an exception is raised for the missing unknown token

* style

* add test tokens
2021-08-12 09:42:44 -04:00
3a002c1aa8 Python - prepare for release 0.10.3 2021-05-24 16:59:10 -04:00
32b3b7a0f2 Python - Prepare for release 0.10.2 2021-04-05 16:47:55 -04:00
bc8bbf637a Prepare for python v0.10.1 (#625) 2021-02-08 11:45:56 -05:00
d96442cbe8 Python - Prepare for release 0.10.1rc1 (#622) 2021-02-04 10:37:00 -05:00
719bea76b9 Python - Prepare for release 0.10.0 2021-01-12 16:34:04 -05:00
0c6cc39eee Python - Update CHANGELOG and bump for release 2020-12-08 13:29:35 -05:00
8916b6bb27 Add a visualization utility to render tokens and annotations in a notebook (#508)
* Draft functionality of visualization

* Added comments to make code more intelligble

* polish the styles

* Ensure colors are stable and comment the css

* Code clean up

* Made visualizer importable and added some docs

* Fix styling

* implement comments from PR

* Fixed the regex for UNK tokens and examples in notebook

* Converted docs to google format

* Added a notebook showing multiple languages and tokenizers

* Added visual indication of chars that are tokenized with >1 token

* Reorganize things a bit and fix import

* Update docs

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-12-04 10:25:56 -05:00
75b41dab0f Python - Update CHANGELOG and bump version for 0.9.4 2020-11-09 16:36:04 -05:00
2364d376f7 Python - Update CHANGELOG and bump to 0.9.3 for release 2020-10-26 16:40:24 -04:00
91f602f744 Python - Update CHANGELOG and bump to 0.9.2 for release 2020-10-15 10:14:58 -04:00
f94a274702 Python - Update CHANGELOG and bump version for release 2020-10-13 14:45:21 -04:00
4f4ba4a11a Python - Bump version for 0.9.0 release 2020-10-09 13:00:19 -04:00
dd9fda5d05 Bump rc version. 2020-10-06 11:04:36 +02:00
aebf510c5a Python - Update CHANGELOG and bump to 0.9.0.rc1 2020-09-29 10:24:24 -04:00
171a042ee0 Python - Bump version for dev4 release 2020-09-24 10:16:18 -04:00
c536b4992b Move to dev3 build. 2020-09-22 08:21:38 +02:00
330876ae02 Improvements on spm parity: (#401)
* Removing all pre_tokenizer logic from Unigram algorithm.

* Improving *a lot* the parity check.

- We can now detect a lot more errors
- Special cases have been added temporarily.

* Adding 2 new normalizers that mimick spm defaut's behavior.

* Adding `encoding_optimized` version of the `encode` algorithm.

- Removes Lattice allocation.
- Changes trie `common_prefix_search` to return an iterator to avoid
  allocation of the full results.

* Trie<char> -> Trie<u8> Another improvement on speed.

* [WIP] Attempt to create a Precompiled Normalizer from SPM to be 100%
compliant with arbitrary models.

* Adding a new `Precompiled` Normalizer that is replacing `SpmNmtNfkc`.

- It will be used for direct compatiblity with `Spm` and replace all
their custom rules by using directly the normalizer spec embedded
within spm files, removing all need for any rules for us.
- We need `nom` dependency to parse the binary format of `spm`.
- We need to add `sentencepiece_model_pb2.py` file to be able to read
  the proto file.
- We reimplemented their `Darts::DoubleArray` compact trie format.

* Fixing a bug with Precompiled normalizer.

* Fixing some edge cases (now in tests) with this weird precompiled
normalizer.

It seems a very handy crafted trie does not prevent from shooting
oneself in the foot. Sorry future reader.

* Keep API stable for this PR (change of the API should come later #409).

- Removed sentencepiece_model_pb2 from binding and add instructions to
make `from_spm` work.

* Adding model check in `from_spm`.

* Adressing @n1t0's comments.

* Adding a check to make sure alignments stay correct.

Also added a bit more documentation on how Precompiled works.

* Extracting `Precompiled` into it's own `spm_precompiled` crate.

* Using ranges in `do_nmt`.
2020-09-15 22:21:02 +02:00
b8f1eb48cb Python - Bump version for 0.9.0.dev1 release 2020-09-02 22:31:01 -04:00
c036cd4ced Python - Bump version for 0.9.0.dev0 release 2020-08-21 18:52:29 -04:00
0d7c232f95 Move Python source to subdirectory.
This allows testing versions not built in-place. Otherwise
importing (or testing) in the package root fails without develop
builds.
Replace maturin with setuptools_rust since maturin fails with
proper project structure.
2020-07-25 23:40:47 +02:00
c901f86d52 Python - Bump version for 0.8.1 2020-07-20 16:33:48 -04:00
157feed9a5 Python - Bump version for 0.8.1.rc2 2020-07-17 13:12:23 -04:00
5be375eaea Update CHANGELOGs and bump version for python release 2020-07-06 15:21:47 -04:00
6349ca51b3 Python - Bump version for 0.8.0 release 2020-06-26 16:12:26 -04:00
8ae1982149 Finally it will be rc4 for transformers 2020-06-26 15:36:08 -04:00
5a653869af Try local version for transformers 2020-06-26 15:19:00 -04:00
1a08b21329 Python - Bump version for 0.8.0.transformers release 2020-06-26 14:37:22 -04:00
74d812d401 Python - Bump version to 0.8.0.rc3 for release 2020-06-22 12:54:31 -04:00
a14cd7b219 Python - Bump version to 0.8.0.rc2 for release 2020-06-19 10:48:53 -04:00
fb964adfdb Python - Bump version to 0.8.0.rc1 for release 2020-06-11 14:24:34 -04:00