Commit Graph

70 Commits

Author SHA1 Message Date
bc8bbf637a Prepare for python v0.10.1 (#625) 2021-02-08 11:45:56 -05:00
d96442cbe8 Python - Prepare for release 0.10.1rc1 (#622) 2021-02-04 10:37:00 -05:00
719bea76b9 Python - Prepare for release 0.10.0 2021-01-12 16:34:04 -05:00
0c6cc39eee Python - Update CHANGELOG and bump for release 2020-12-08 13:29:35 -05:00
8916b6bb27 Add a visualization utility to render tokens and annotations in a notebook (#508)
* Draft functionality of visualization

* Added comments to make code more intelligble

* polish the styles

* Ensure colors are stable and comment the css

* Code clean up

* Made visualizer importable and added some docs

* Fix styling

* implement comments from PR

* Fixed the regex for UNK tokens and examples in notebook

* Converted docs to google format

* Added a notebook showing multiple languages and tokenizers

* Added visual indication of chars that are tokenized with >1 token

* Reorganize things a bit and fix import

* Update docs

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-12-04 10:25:56 -05:00
75b41dab0f Python - Update CHANGELOG and bump version for 0.9.4 2020-11-09 16:36:04 -05:00
2364d376f7 Python - Update CHANGELOG and bump to 0.9.3 for release 2020-10-26 16:40:24 -04:00
91f602f744 Python - Update CHANGELOG and bump to 0.9.2 for release 2020-10-15 10:14:58 -04:00
f94a274702 Python - Update CHANGELOG and bump version for release 2020-10-13 14:45:21 -04:00
4f4ba4a11a Python - Bump version for 0.9.0 release 2020-10-09 13:00:19 -04:00
dd9fda5d05 Bump rc version. 2020-10-06 11:04:36 +02:00
aebf510c5a Python - Update CHANGELOG and bump to 0.9.0.rc1 2020-09-29 10:24:24 -04:00
171a042ee0 Python - Bump version for dev4 release 2020-09-24 10:16:18 -04:00
c536b4992b Move to dev3 build. 2020-09-22 08:21:38 +02:00
330876ae02 Improvements on spm parity: (#401)
* Removing all pre_tokenizer logic from Unigram algorithm.

* Improving *a lot* the parity check.

- We can now detect a lot more errors
- Special cases have been added temporarily.

* Adding 2 new normalizers that mimick spm defaut's behavior.

* Adding `encoding_optimized` version of the `encode` algorithm.

- Removes Lattice allocation.
- Changes trie `common_prefix_search` to return an iterator to avoid
  allocation of the full results.

* Trie<char> -> Trie<u8> Another improvement on speed.

* [WIP] Attempt to create a Precompiled Normalizer from SPM to be 100%
compliant with arbitrary models.

* Adding a new `Precompiled` Normalizer that is replacing `SpmNmtNfkc`.

- It will be used for direct compatiblity with `Spm` and replace all
their custom rules by using directly the normalizer spec embedded
within spm files, removing all need for any rules for us.
- We need `nom` dependency to parse the binary format of `spm`.
- We need to add `sentencepiece_model_pb2.py` file to be able to read
  the proto file.
- We reimplemented their `Darts::DoubleArray` compact trie format.

* Fixing a bug with Precompiled normalizer.

* Fixing some edge cases (now in tests) with this weird precompiled
normalizer.

It seems a very handy crafted trie does not prevent from shooting
oneself in the foot. Sorry future reader.

* Keep API stable for this PR (change of the API should come later #409).

- Removed sentencepiece_model_pb2 from binding and add instructions to
make `from_spm` work.

* Adding model check in `from_spm`.

* Adressing @n1t0's comments.

* Adding a check to make sure alignments stay correct.

Also added a bit more documentation on how Precompiled works.

* Extracting `Precompiled` into it's own `spm_precompiled` crate.

* Using ranges in `do_nmt`.
2020-09-15 22:21:02 +02:00
b8f1eb48cb Python - Bump version for 0.9.0.dev1 release 2020-09-02 22:31:01 -04:00
c036cd4ced Python - Bump version for 0.9.0.dev0 release 2020-08-21 18:52:29 -04:00
0d7c232f95 Move Python source to subdirectory.
This allows testing versions not built in-place. Otherwise
importing (or testing) in the package root fails without develop
builds.
Replace maturin with setuptools_rust since maturin fails with
proper project structure.
2020-07-25 23:40:47 +02:00
c901f86d52 Python - Bump version for 0.8.1 2020-07-20 16:33:48 -04:00
157feed9a5 Python - Bump version for 0.8.1.rc2 2020-07-17 13:12:23 -04:00
5be375eaea Update CHANGELOGs and bump version for python release 2020-07-06 15:21:47 -04:00
6349ca51b3 Python - Bump version for 0.8.0 release 2020-06-26 16:12:26 -04:00
8ae1982149 Finally it will be rc4 for transformers 2020-06-26 15:36:08 -04:00
5a653869af Try local version for transformers 2020-06-26 15:19:00 -04:00
1a08b21329 Python - Bump version for 0.8.0.transformers release 2020-06-26 14:37:22 -04:00
74d812d401 Python - Bump version to 0.8.0.rc3 for release 2020-06-22 12:54:31 -04:00
a14cd7b219 Python - Bump version to 0.8.0.rc2 for release 2020-06-19 10:48:53 -04:00
fb964adfdb Python - Bump version to 0.8.0.rc1 for release 2020-06-11 14:24:34 -04:00
d00ac60162 Update changelogs and bump version for python release 2020-06-03 18:27:49 -04:00
2a0f2337db Python - Update CHANGELOG and bump version to 0.8.0.dev1 for release 2020-05-27 14:22:00 -04:00
5a01792413 Python - Update CHANGELOGs and bump to 0.8.0-dev for release 2020-05-21 18:57:02 -04:00
670f619ab5 Python - bump to 0.7.0 for final release 2020-04-17 12:48:10 -04:00
3312ad75d9 Python - Bump to 0.7.0rc6 for release 2020-04-16 19:39:04 -04:00
bdfb02f473 Python - Bump to 0.7.0rc6 for release 2020-04-16 14:42:22 -04:00
09104afd07 Python - Bump to 0.7.0-rc5 for release 2020-04-09 11:41:10 -04:00
25afbb5fde Python - Bump to 0.7.0-rc4 for release 2020-04-08 14:27:29 -04:00
b03fea1d66 Python - Update workflow and Makefile with tests 2020-04-01 17:36:33 -04:00
93a83127ae Bump version for Python release 2020-03-31 14:25:47 -04:00
e8aec7a624 Bump version for Python release 2020-03-27 09:17:35 -04:00
b132be34af Bump version for Python release 2020-03-26 17:26:14 -04:00
8e791791d1 Python - prepare for release 2020-03-02 14:56:42 -05:00
440e8e9bd9 Python - Bump version for release 2020-02-24 16:08:49 -05:00
999088ef94 Python - Bump version for release 2020-02-24 09:56:08 -05:00
11dd6c8bae Python - Bump version for release 2020-02-18 18:49:11 -05:00
41929462c7 Python - Add classifiers 2020-02-18 18:48:21 -05:00
bbbd97c7e1 Python - Bump version for release 2020-02-11 08:15:11 -05:00
c1ddfdac8c Python - bump version for release 2020-02-10 23:23:27 -05:00
3c0164ef75 Python - Bump version for release 2020-02-10 16:07:32 -05:00
9745786b89 Bump versions for release 2020-02-05 13:55:51 -05:00
0105021280 Bump version for Python 2020-01-22 16:07:03 -05:00