Anthony MOI
d788a950ac
Doc - Fixes some CI fails
2020-11-02 17:07:27 -05:00
Anthony MOI
324aa2930a
Doc - Improve python and node tests
2020-11-02 17:07:27 -05:00
Anthony MOI
b6ffd9cba0
Doc - Cleanup old tests & node lints
2020-11-02 17:07:27 -05:00
Anthony MOI
9521603e08
Doc - Update Decoder part of the Pipeline page
2020-11-02 17:07:27 -05:00
Anthony MOI
8b65c1f4bc
Doc - Update Bert example on the Pipeline page
2020-11-02 17:07:27 -05:00
Anthony MOI
620769fd4b
Doc - Update PreTokenizer part of the Pipeline page
2020-11-02 17:07:27 -05:00
Anthony MOI
13a80050f0
Doc - Update Normalizer part of the Pipeline page
2020-11-02 17:07:27 -05:00
Anthony MOI
4cf0a0b72c
Doc - Quicktour uses python tested code
2020-11-02 17:07:27 -05:00
Anthony MOI
d2fc0e4836
Doc - Update API Reference for Encoding
2020-11-02 17:07:27 -05:00
Anthony MOI
a86d49634c
Doc - API Reference for most Tokenizer methods/attributes
2020-11-02 17:07:27 -05:00
Anthony MOI
8c0370657e
Doc - Update API Reference on more Tokenizer methods
2020-11-02 17:07:27 -05:00
Anthony MOI
ddabe130cd
Doc - Updated API Reference for AddedToken
2020-11-02 17:07:27 -05:00
Anthony MOI
79f02bb7f0
Doc - Updated API Reference for encode/encode_batch
2020-11-02 17:07:27 -05:00
Anthony MOI
3ee54766e3
Doc - Backbone for API Reference
2020-11-02 17:07:27 -05:00
Anthony MOI
000c19a7a5
Doc - Improve snippets testing
2020-11-02 17:07:27 -05:00
Anthony MOI
e865b7cd7c
Customize the doc for each language
2020-11-02 17:07:27 -05:00
Nicolas Patry
655809c718
Attempt to get some documentation going.
2020-11-02 17:07:27 -05:00
taufique74
4929809af0
makes from_file() method static
2020-11-01 13:15:15 -05:00
Anthony MOI
991128f9e1
Node - Fix models init methods & add WordLevel
2020-10-30 13:47:04 -04:00
Anthony MOI
2364d376f7
Python - Update CHANGELOG and bump to 0.9.3 for release
2020-10-26 16:40:24 -04:00
Anthony MOI
466f5303eb
Fix UnigramTrainer
2020-10-26 16:31:58 -04:00
Anthony MOI
73b5da917f
Unigram - Add special_tokens at the end of training + optional unk
2020-10-26 10:57:29 -04:00
Anthony MOI
1a6f4b5204
Allow initial_alphabet on UnigramTrainer
2020-10-26 10:57:29 -04:00
Timur Ganiev
f7c61c267a
Fixed BPE.read_files -> BPE.read_file in SentencePieceBPETokenizer
2020-10-26 10:57:14 -04:00
Anthony MOI
a2289d49b4
Finish exposing the UnicodeScripts PreTokenizer
2020-10-21 11:01:54 -04:00
Nicolas Patry
180371d929
Fixing hanging error while acquiring GIL from custom pretokenizer during training. ( #470 )
...
* Fixing hanging error while acquiring GIL from custom pretokenizer
during training.
Fixes #469
* cleanup
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2020-10-20 14:23:39 -04:00
Anthony MOI
91f602f744
Python - Update CHANGELOG and bump to 0.9.2 for release
2020-10-15 10:14:58 -04:00
Nicolas Patry
2ccd16bf5c
Adding a new tests for PreTokenizer.custom.
...
This example is more illustrative of what's doable for custom
PreTokenizer.
2020-10-15 10:07:48 -04:00
Anthony MOI
f94a274702
Python - Update CHANGELOG and bump version for release
2020-10-13 14:45:21 -04:00
Nicolas Patry
88556790e7
Fixing a bug where long tokenizer files would be incorrectly deserialized ( #459 )
...
* Fixing a bug where long tokenizer files would be incorrectly
deserialized
- Add a bunch of tests to check deserialization behaviour
- One tests also confirms current Single deserialization of Sequence.
* Better test locations for Windows + no file dependency in Python binding
Rust side.
* Adressing @n1t0 comments.
2020-10-13 18:44:24 +02:00
Anthony MOI
3bb794681c
Python - Use 1.46.0 for now
2020-10-09 13:40:35 -04:00
Anthony MOI
83e11a8de4
Python - Update dependencies for release
2020-10-09 13:09:35 -04:00
Anthony MOI
4f4ba4a11a
Python - Bump version for 0.9.0 release
2020-10-09 13:00:19 -04:00
Nicolas Patry
fbca797b3d
Fixing Trainer with u8 instead of chars. ( #452 )
...
* Fixing Trainer with u8 instead of chars.
Now check both optimized and unoptimized encodings schemes for Unigram.
* Small fixes.
* Fixing makefile.
2020-10-09 18:57:14 +02:00
Nicolas Patry
dd9fda5d05
Bump rc version.
2020-10-06 11:04:36 +02:00
Anthony MOI
aebf510c5a
Python - Update CHANGELOG and bump to 0.9.0.rc1
2020-09-29 10:24:24 -04:00
Anthony MOI
ff57504972
Python - Add some more test for TemplateProcessing
2020-09-29 10:09:10 -04:00
Nicolas Patry
6c25bb729b
Update __init__.pyi
2020-09-29 10:09:10 -04:00
Anthony MOI
1070eb471e
Python - Update bindings for TemplateProcessing
2020-09-29 10:09:10 -04:00
Dagmawi Moges
7f8b357b92
Fixed Dead Link: Build your own #435 ( #436 )
...
* Fixed Dead Link: Build your own #435
* Update bindings/python/README.md
Co-authored-by: Anthony MOI <xn1t0x@gmail.com >
2020-09-25 09:41:31 -04:00
Anthony MOI
a0a163fd62
Remove unwanted file
2020-09-24 14:05:47 -04:00
Anthony MOI
171a042ee0
Python - Bump version for dev4 release
2020-09-24 10:16:18 -04:00
Nicolas Patry
a410903051
Upgrading to black 20.8b1
2020-09-24 09:27:30 -04:00
Anthony MOI
8308508577
Python - Update bindings for Replace Normalizer
2020-09-24 08:05:57 -04:00
Nicolas Patry
598ce61229
Removed now wrong code in convert.py, fixed strange black magic.
2020-09-24 08:57:02 +02:00
Nicolas Patry
95cc8c47ad
Changed rust api for merges, that is now Vec<(String, String)>
2020-09-24 08:57:02 +02:00
Nicolas Patry
36832bfa12
from_files -> from_file everywhere
...
- read_files -> read_file
- from_file pure rust impl in python bindings
- Fix some typing in python binding
- Added {BPE,WordLevel,WordPiece}.from_file tests.
2020-09-24 08:57:02 +02:00
Nicolas Patry
9672995a56
We use 19.10b0 not 20 here...
2020-09-24 08:57:02 +02:00
Nicolas Patry
35ee1968c0
Black *Version* check.
2020-09-24 08:57:02 +02:00
Nicolas Patry
9b1ef9d895
Black pre-commit after rebase.
2020-09-24 08:57:02 +02:00