Commit Graph

512 Commits

Author SHA1 Message Date
Anthony MOI
000c19a7a5 Doc - Improve snippets testing 2020-11-02 17:07:27 -05:00
Anthony MOI
e865b7cd7c Customize the doc for each language 2020-11-02 17:07:27 -05:00
Nicolas Patry
655809c718 Attempt to get some documentation going. 2020-11-02 17:07:27 -05:00
taufique74
4929809af0 makes from_file() method static 2020-11-01 13:15:15 -05:00
Anthony MOI
991128f9e1 Node - Fix models init methods & add WordLevel 2020-10-30 13:47:04 -04:00
Anthony MOI
2364d376f7 Python - Update CHANGELOG and bump to 0.9.3 for release 2020-10-26 16:40:24 -04:00
Anthony MOI
466f5303eb Fix UnigramTrainer 2020-10-26 16:31:58 -04:00
Anthony MOI
73b5da917f Unigram - Add special_tokens at the end of training + optional unk 2020-10-26 10:57:29 -04:00
Anthony MOI
1a6f4b5204 Allow initial_alphabet on UnigramTrainer 2020-10-26 10:57:29 -04:00
Timur Ganiev
f7c61c267a Fixed BPE.read_files -> BPE.read_file in SentencePieceBPETokenizer 2020-10-26 10:57:14 -04:00
Anthony MOI
a2289d49b4 Finish exposing the UnicodeScripts PreTokenizer 2020-10-21 11:01:54 -04:00
Nicolas Patry
180371d929 Fixing hanging error while acquiring GIL from custom pretokenizer during training. (#470)
* Fixing hanging error while acquiring GIL from custom pretokenizer
during training.

Fixes #469

* cleanup

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-10-20 14:23:39 -04:00
Anthony MOI
91f602f744 Python - Update CHANGELOG and bump to 0.9.2 for release 2020-10-15 10:14:58 -04:00
Nicolas Patry
2ccd16bf5c Adding a new tests for PreTokenizer.custom.
This example is more illustrative of what's doable for custom
PreTokenizer.
2020-10-15 10:07:48 -04:00
Anthony MOI
f94a274702 Python - Update CHANGELOG and bump version for release 2020-10-13 14:45:21 -04:00
Nicolas Patry
88556790e7 Fixing a bug where long tokenizer files would be incorrectly deserialized (#459)
* Fixing a bug where long tokenizer files would be incorrectly
deserialized

- Add a bunch of tests to check deserialization behaviour
- One tests also confirms current Single deserialization of Sequence.

* Better test locations for Windows + no file dependency in Python binding
Rust side.

* Adressing @n1t0 comments.
2020-10-13 18:44:24 +02:00
Anthony MOI
3bb794681c Python - Use 1.46.0 for now 2020-10-09 13:40:35 -04:00
Anthony MOI
83e11a8de4 Python - Update dependencies for release 2020-10-09 13:09:35 -04:00
Anthony MOI
4f4ba4a11a Python - Bump version for 0.9.0 release 2020-10-09 13:00:19 -04:00
Nicolas Patry
fbca797b3d Fixing Trainer with u8 instead of chars. (#452)
* Fixing Trainer with u8 instead of chars.

Now check both optimized and unoptimized encodings schemes for Unigram.

* Small fixes.

* Fixing makefile.
2020-10-09 18:57:14 +02:00
Nicolas Patry
dd9fda5d05 Bump rc version. 2020-10-06 11:04:36 +02:00
Anthony MOI
aebf510c5a Python - Update CHANGELOG and bump to 0.9.0.rc1 2020-09-29 10:24:24 -04:00
Anthony MOI
ff57504972 Python - Add some more test for TemplateProcessing 2020-09-29 10:09:10 -04:00
Nicolas Patry
6c25bb729b Update __init__.pyi 2020-09-29 10:09:10 -04:00
Anthony MOI
1070eb471e Python - Update bindings for TemplateProcessing 2020-09-29 10:09:10 -04:00
Dagmawi Moges
7f8b357b92 Fixed Dead Link: Build your own #435 (#436)
* Fixed Dead Link: Build your own #435

* Update bindings/python/README.md

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
2020-09-25 09:41:31 -04:00
Anthony MOI
a0a163fd62 Remove unwanted file 2020-09-24 14:05:47 -04:00
Anthony MOI
171a042ee0 Python - Bump version for dev4 release 2020-09-24 10:16:18 -04:00
Nicolas Patry
a410903051 Upgrading to black 20.8b1 2020-09-24 09:27:30 -04:00
Anthony MOI
8308508577 Python - Update bindings for Replace Normalizer 2020-09-24 08:05:57 -04:00
Nicolas Patry
598ce61229 Removed now wrong code in convert.py, fixed strange black magic. 2020-09-24 08:57:02 +02:00
Nicolas Patry
95cc8c47ad Changed rust api for merges, that is now Vec<(String, String)> 2020-09-24 08:57:02 +02:00
Nicolas Patry
36832bfa12 from_files -> from_file everywhere
- read_files -> read_file
- from_file pure rust impl in python bindings
- Fix some typing in python binding
- Added {BPE,WordLevel,WordPiece}.from_file tests.
2020-09-24 08:57:02 +02:00
Nicolas Patry
9672995a56 We use 19.10b0 not 20 here... 2020-09-24 08:57:02 +02:00
Nicolas Patry
35ee1968c0 Black *Version* check. 2020-09-24 08:57:02 +02:00
Nicolas Patry
9b1ef9d895 Black pre-commit after rebase. 2020-09-24 08:57:02 +02:00
Nicolas Patry
acd4a7599f Black. 2020-09-24 08:57:02 +02:00
Nicolas Patry
8f8156fd2c Adressing first pass of comments. 2020-09-24 08:57:02 +02:00
Nicolas Patry
1cd4824273 Black on pyi file. 2020-09-24 08:57:02 +02:00
Nicolas Patry
60c1e25910 New version. Staticmethods need to return a IntoPy<PyObject>
which is non trivial for PyClassInitializer. Instead I added a lower
staticmethod that returns raw objects, and the `from_file(s)` methods
are implemented directly in Python.
2020-09-24 08:57:02 +02:00
Nicolas Patry
98a30eead1 Temp work to make the APIs uniform (build from memory by default). 2020-09-24 08:57:02 +02:00
Anthony MOI
b24a2fc178 Some suggestions from @narsil 2020-09-23 15:50:01 -04:00
Anthony MOI
31b81f109b Python - Fix for PySlice on Windows 2020-09-23 15:50:01 -04:00
Anthony MOI
b9a051f464 Python - Update some missing typings 2020-09-23 15:50:01 -04:00
Anthony MOI
7492a1d698 Python - Update typings for NormalizedString 2020-09-23 15:50:01 -04:00
Anthony MOI
0b448f46d4 Python - Update typings for PreTokenizedString 2020-09-23 15:50:01 -04:00
Anthony MOI
b1097a988f Python - Improved example with custom components 2020-09-23 15:50:01 -04:00
Anthony MOI
0a930ef1d8 Python - Update bindings for PreTokenizer 2020-09-23 15:50:01 -04:00
Anthony MOI
53aad4eca0 Python - Update support for custom Decoder 2020-09-23 15:50:01 -04:00
Anthony MOI
08a3128515 Python - Add bindings for some Model methods 2020-09-23 15:50:01 -04:00